We’re
migrating our code base to Java 8. We’ll replace everything by functions. Throw
out design patterns. Remove object orientation. Right! Let’s go!
Performance – you will lose on it
In many
cases, performance is not critical, and you shouldn’t do any premature
optimisation – so you may claim that this argument is not really an argument
per se. But I will counter this attitude in this case, saying that the overhead
of
Stream.forEach()
compared to an ordinary for
loop is so significant in general that
using it by default will just pile up a lot of useless CPU cycles across all of
your application. If we’re talking about 10%-20% more CPU consumption just
based on the choice of loop style, then we did something fundamentally wrong.
Yes – individual loops don’t matter, but the load on the overall system could
have been avoided.
the
designers of the Java
programming language did not extend Java and its JDK to allow functional
programming in Java or to turn Java into a hybrid “object-oriented &
functional” programming language. The actual motivation for inventing streams
for Java was performance or – more precisely – making parallelism more
accessible to software developers (see Brian Goetz, State of the Lambda).
This goal makes a lot of sense to me, considering the way in which hardware
evolves. Our hardware has dozens of cpu cores today and will probably have
hundreds some time in the future.
In order to effectively utilize the hardware
capabilities and thereby achieve state-of-the-art execution performance we must
parallelize. After all – what is the point to running a single thread on a
multicore platform? At the same time, multithread programming is considered
hard and error-prone, and rightly so. Streams, which come in two flavours (as
sequential and parallel streams), are designed to hide the complexity of
running multiple threads. Parallel streams make it extremely easy to execute
bulk operations in parallel – magically, effortlessly, and in a way that is
accessible to every Java developer.
So, let’s talk about performance. How fast are the
Java 8 streams? A common expectation is that parallel execution of stream
operations is faster than sequential execution with only a single thread. Is it
true? Do streams improve performance?In order to answer questions regarding performance we must measure, that is, run a micro-benchmark. Benchmarking is hard and error-prone, too. You need to perform a proper warm-up, watch out for all kinds of distorting effects from optimizations applied by the virtual machine’s JIT compiler (dead code elimination being a notorious one) up to hardware optimizations (such as increasing one core’s cpu frequency if the other cores are idle). In general, benchmark results must be taken with a grain of salt. Every benchmark is an experiment. Its results are context-dependent. Never trust benchmark figures that you haven’t produced yourself in your context on your hardware. This said, let us experiment.
First, access to array elements is very fast. It is an index-based memory access with no overhead whatsoever. In other words, it is a plain down-to-the-metal memory access. Elements in a collection such as ArrayList on the other hand are accessed via an iterator and the iterator inevitably adds overhead. Plus, there is the overhead of boxing and unboxing collection elements whereas int-arrays use plain primitive type ints. Essentially, the measurements for the ArrayList are dominated by the iteration and boxing overhead whereas the figures for the int-array illustrate the advantage of
for-
loops.Secondly, had we seriously expected that streams would be faster than plain for-loops? Compilers have 40+ years of experience optimizing loops and the virtual machine’s JIT compiler is especially apt to optimize
for-
loops
over arrays with an equal stride like the one in our benchmark. Streams on the
other hand are a very recent addition to Java and the JIT compiler does not
(yet) perform any particularly sophisticated optimizations to them.Thirdly, we must keep in mind that we are not doing much with the sequence elements once we got hold of them. We spend a lot of effort trying to get access to an element and then we don’t do much with it. We just compare two integers, which after JIT compilation is barely more than one assembly instruction. For this reason, our benchmarks illustrate the cost of element access – which need not necessarily be a typical situation. The performance figures change substantially if the functionality applied to each element in the sequence is cpu intensive. You will find that there is no measurable difference any more between for-loop and sequential stream if the functionality is heavily cpu bound.
No comments:
Post a Comment