To measure the overhead of the data pipeline itself we keep the work in the lambda expressions small.

As the source is a simple range and the result is a sum any observed memory pressure should be from the data pipeline.

In addition, I vary the collection size from 1 to 10,000,000 but keep the overall useful work the same so that we can compare the times.

So without further ado:

Performance in Milliseconds - F# 4, .NET 4.6.2, x64

Collection Count - F# 4, .NET 4.6.2, x64

Interpreting the results

Note that the axis uses logarithmic scale

Imperative

The imperative code is the fastest in general. The reason for this is that it is implemented as an efficient for loop.

Other pipelines create intermediate objects that support constructing the pipeline, this creates memory pressure. In order to pass values down the pipeline virtual dispatch is used which is hard for the jitter to inline.

I implemented a few data pipelines in C++ and modern C++ compilers are able to eliminate the intermediate objects. What remains typically is an optimized loop with almost identical performance to imperative code. That's pretty cool.

Push is faster than Pull

The push pipelines (like Nessos.Streams) seems to perform better in general than pull pipeline.

One of the reasons is that in a pull pipeline for each "pipe" there has to be a conditional check to see if there is more data. For push pipelines the checks are largely eliminated.

Push is also easier to implement which is nice.

Memory overhead of data pipelines

The reason that Collection Count is higher for small collections is that I create more pipelines to keep useful work the same. If there's memory overhead from creating a pipeline this shows up for smaller collection sizes.

This inspired me to create the PushPipe that allows reusing the pipeline over several runs. I think my design can be improved a lot but we see that PushPipe is missing from the Collection Count because it doesn't have to recreate the pipeline. This also makes the CPU overhead lower for smaller collection sizes.

Nessos.LinqOptimizer

Nessos.LinqOptimizer is both among the worst and among the best. The reason is that there's a huge overhead of creating the data pipeline because Nessos.LinqOptimizer analyzes the expression trees, optimizes them, compiles them into ILCode which is then jitted.

While this gives almost "imperative" performance for large collections it also gives poor performance for small collections. I think Nessos.LinqOptimizer could benefit from being able to cache and reuse the pipeline and I was looking for such a thing in the API but didn't find it.

Seq performance

The careful reader notices two Seq measurements. Seq2 contains a modified version of Seq.upto that doesn't allocate objects needlessly. This improves performance by approximately 4 times and reduces the memory pressure. It's a bit of challenge to get the semantics of upto to be exactly as before but if I succeed this should become an F# PR.

SeqComposer brings significant performance improvements

Finally, SeqComposer is taken from the PR that seeks to replace Seq. It improves Seq performance by roughly an order of magnitude which is a welcome improvement. The improvement is even greater if one uses the Composer directly.

Hope you found this interesting,

Mårten

Appendix

Imperative vs PushPipe

I was asked why the Imperative performs better than PushPipe (for example). For larger collections the construction of the intermediate objects shouldn't affect performance so where's the difference coming from?

Stream<'T> is used to build up a chain of Receiver<'T>. Each value in the stream passed to the next receiver using virtual dispatch. This is the reason why we see virtual calls in the jitted code for PushPipe.

In principle the F# compiler could eliminate the virtual tail calls but currently it doesn't. Unfortunately, the jitter neither have enough information nor time to inline the virtual calls.

There are a lot of similarities with PushStream<'T> so even though we haven't looked at the SeqComposer code we see that it does support push. SeqComposer is more advanced than PushStream<'T> in that it supports pull as well.

The difference in performance between PushStream and SeqComposer2 seems to come mostly from a more generic top-loop:

It seems the difference in performance mainly comes from that SeqComposer has 2 extra checks per value and an extra virtual call (isSkipping () is not called in the pipeline above). SeqComposer also uses checked addition which adds a small overhead (a conditional branch and stack adjusting).

This comment has been minimized.

LinqOptimizer offers a cache API called compile that can be used to precompile a query/expression tree. This way in subsequent invocations you only have to call a delegate. While this API works in C#, in F# it's problematic/unusable due to the way the F# compiler constructs expression trees. Just mentioning this for the sake of completeness.

This comment has been minimized.

Composer works by extending IEnumerable with 2 functions (the derived interface is called ISeq). For compatibility the Seq module interface is unmodified but all it now contains are forwarding calls it to the Composer module. The Composer module can be used directly though via Seq.toComposer (or Composer.ofSeq). You can then use Composer.map/Composer.filter rather than Seq.Map/Seq.Filter etc. This has the advantage of inling the lambdas which will give you a bit of a performance increase. The performance using this should be more similar to Nessos Streams.

I would like to cache the full query but leave count to passed as an input parameter to the delegate. If the source was an array I like the input of the delegate to be an array. Perhaps there's something obvious I am missing out on.