I want to thank Leon P. Smith for showing me the idea of producing the spans of odds directly, for version IV. I had a combination of span and infinite odds list, as in span (< p*p) [3,5..] etc. That sped it up some 20% more, when GHC-compiled.

The mark-and-comb version that I put under Simple Sieve of Eratosthenes seems to me very "faithful" to the original (IYKWIM). Strangely it shows exactly same asymptotic behavior when GHC-compiled (tested inside GHCi) as IV. Does this prove that priority queue-based code is better than the original? :)

BTW "unzip" is somehow screwed up inside "haskell" block, I don't know how to fix that.

I've also added the postponed-filters version to the first sieve code to show that the squares optimization does matter and gives huge efficiency advantage just by itself. The odds only trick gives it a dozen or two percent improvement, but it's nothing compared to this 20x massive speedup!

I've added the code for Euler's sieve which is just the postponed filters with minimal modification, substituting (t `minus` multiples p) for (filter (nodivs p) t).

as it later turned out it was not a Euler sieve, but rather an approximation. WillNess 13:27, 10 February 2011 (UTC)

Now it is obvious that (...(((s - a) - b) - c) - ...) is the same as (s - (a + b + c + ...)) and this is the next code, the "merged multiples" variation of Euler's sieve.

It is very much like the streamlined and further optimized famous Richard Bird's code (appearing in Melissa O'Neill'sJFP article), which copyright status is unknown to me, so I couldn't reference it in the main article body. The code as written in the article has the wrong clause order in merge.

when using compare, clause order doesn't matter. WillNess 15:32, 26 January 2010 (UTC)

I've also changed the span pattern-binding to the more correct, lazy pattern, (h,~(_:t)).

New treefolding merge is inspired by apfelmus's VIP code from Implicit Heap; but it uses a different structure, better at primes multiples generation: instead of his 1+(2+(4+(8+...))) it's (2+4)+( (4+8) + ( (8+16) + ...)). The reason I put my version here is to show the natural progression of developement from the postponed filters to Euler's sieve to merged multiples to treefold-merged multiples. I.e. it's not some ad-hoc invention; it's logical. It is also step-by-step.

I estimate the total cost of producing primes multiples as Sum (1/p)*d, where d is the leaf's depth, i.e. the amount of merge nodes its produced prime must pass on its way up to the top. The values for cost function correspond very well with the actual time performance of the respective algorithms: it's better by 10%-12% and the performance boost is 10%-12% too.

I will also add this code further improved with the Wheel optimization here. That one beats the PQ-based code from Melissa ONeill's ZIP file by a constant margin of around 20%, its asympotic behaviour *exactly* the same.

that was with the incomplete code which only rolled the wheel on numbers supply, and not on multiples. It had e.g. [11*11,11*13,11*15,11*17...] but of course 11*15 could've been eliminated in advance too (and 11*25, 11*35, 11*45, 11*49, etc...). Fixing that made it run twice faster than before. WillNess 08:33, 29 December 2009 (UTC)

these tests were most probably wrong, either on GHCi or without using the -O2 switch WillNess 13:27, 10 February 2011 (UTC)

I measure local asymptotics by taking a logBase of run time ratio in base of a problem size ratio. I've settled on testing code performance as interpreted, inside GHCi. Running a compiled code feels more like testing a compiler itself. Too many times I saw two operationally equivalent expressions running in wildly different times. It can't be anything else other than the compiler's quirks, and we're not interested in those, here. :)

AND his other idea: making `tfold' strict - which really brings down the memory consumption. The only caveat: use at least 6 primes to bootstrap the tree-folding. At each tree expansion it needs additional 3*2^n, n=1,... primes, but is producing PI( (PRIMES !! SUM(i=1..n)(3*2^i)) ^ 2) which is way more than that. WillNess 10:02, 25 January 2010 (UTC)

It is twice faster, but more obscure; so I thought I'd keep the previous version on the main page for didactic reasons. WillNess 08:43, 18 July 2010 (UTC)

I've added it now to the main page and restyled the treefold code a bit. The test entries on Ideone.com are here. WillNess 11:00, 6 August 2010 (UTC)

ST Array code also becomes much faster (3x for 1 mln-th prime in fact) when translated into working on odds only, like the immutable array version - but its memory footprint is also large. WillNess 10:34, 13 August 2010 (UTC)

Augmenting the latest Treefold Merged Multiples, with Wheel version to work with VIPs does nothing except slow it down a little bit. Lazy pattern in join/tfold also starts causing space leak then, so primes' becomes necessary to be more defined upfront to prevent a loop when the tilde is removed. WillNess 18:31, 6 February 2011 (UTC)