A squarefree number is one which is not divisible by any perfect squares. Put another way, the prime factorization of a squarefree number includes at most one copy of any given prime. So, the first few squarefree numbers are

1,2,3,5,6,7,10,11,13,14,15,…

How can we generate these in Haskell? In particular, we want to define the infinite sorted list of squarefree numbers, to be lazily computed on demand. We could implement a brute-force method where we simply factor every positive integer and only keep those which don’t have repeated factors. But that would be awfully slow. This is Haskell we’re talking about, there must be a better way!

There are better ways, of course; here’s one. Suppose that (somehow) we have computed the list , which contains all the squarefree numbers with prime factors up to and including , the nth prime number. For example, ; ; and so on. Given , we can compute as follows: multiply every element of by to produce a new list ; this list contains all those squarefree numbers with prime factors up to , which are divisible by . Then we just merge with to produce .

Let’s try it: we start with , the list of all squarefree numbers with no prime factors. Then we compute . Multiplying by 3 gives , and merging yields . Multiplying by 5 gives ; merging again gives us . And so on.

So, how do we translate this into a definition of the infinite list of squarefree numbers? It’s not quite as straightforward as it seems. First of all, we can’t just say the equivalent of “compute “; nothing would ever get generated that way, since everything in depends on everything in . There’s nothing inherent in the above method that indicates which part of the list isn’t going to change on the next iteration. And we can’t do something like lazily merge all the lists in ; the problem is that every squarefree number occurs infinitely often in such a list.

The key is to note that the “intermediate” lists are more important than we might have thought. The infinite sequence of lists in fact contains every squarefree number exactly once (except 1); moreover, they are ordered by their first elements in addition to being ordered themselves, which gives us just enough information to implement an infinite merge that will actually get around to producing something in a finite amount of time!

The primeStep function below takes the prime and the list , and produces the pair . mapAccumL (one of those higher-order functions which isn’t used much but comes in very handy every once in a while) is used to carry along the current list in an accumulator while simultaneously producing the list . Finally mergeAll performs a lazy infinite merge, giving us the infinite list of squarefree numbers.

At the beginning, I said there are better ways; here’s another! It makes use of the beautiful theory of Dirichlet generating functions, and in particular, the fact that is the DGF for the squarefree numbers. Using David Amos’ fantastic mathematics library, this is a piece of cake:

That isn’t quite the same thing; you’ve defined the list of all integers which aren’t perfect squares, whereas I’m talking about numbers which aren’t divisible by any perfect squares. For example, 12 is in your list but not mine.

Aha! The idea of taking the diagonal is neat. It does mean that producing the first n squarefree numbers is O(n^2), since before producing each one it has to first throw away all the previous ones; but then again, I’m not sure what the running time of my version is, either. (Reasoning about time complexity when laziness is involved isn’t obvious…)

Here is a (IMHO nice) way to enumerate all squarefree numbers, although not in sorted order. The key idea is that all sorted numbers can be written as p^a + q^b + r^c + … where p,q,r,… are the prime numbers and a,b,c are drawn from { 0, 1 }. Without further ado, here’s the code:

Well, not only shorter, but more efficient: it no longer has to throw away any of the generated sequences. Note that ([1]:) . map (++[1]) isn’t doing the same thing as filter (not . (==0) . last), it’s just appending 1 to all the sequences and then manually adding in the sequence [1] again.

It occurs to me that you could also replace

concatMap (sequence . flip replicate [0,1]) [1..]

with

concatMap sequence . tail . iterate ([0,1]:) $ []

which isn’t really all that much shorter but is (IMO) clearer. Also, concatMap is just (=<<) in the list monad, so you could also write

Wait, if you take out the call to tail, you get the empty list at the beginning, which then gives [1] when (++[1]) is applied to it, so [1] doesn’t have to be added separately. Here’s the complete (now much shorter) version:

I don’t understand why this is faster than factoring every number: to generate L_inf up to p_n, you have to generate the primes up to p_n, which is probably not much faster than factoring every number up to p_n.

b_jonas: Well, in order to factor every number up to p_n we also have to generate the primes up to p_n, so let’s take calculating a list of primes as given. (I should also note that there are better ways to calculate the primes than what I have here.) The difference is that with this method we are starting with the primes, and using them to generate only squarefree numbers. So we never even have to check non-squarefree numbers, and for squarefree numbers we are implicitly generating their prime factorizations straight away, with no extra work; in order to start with the numbers and factor them, we’d have to do a lot of wasted work checking for divisors.

Another way to think about it: if you forget about the merging, and have primeStep involve (map (p:) xs) instead of (map (p*) xs), we get a list of the prime factorizations of all squarefree numbers. But we do no wasted work to get it (checking for divisors which aren’t divisors, factoring non-squarefree numbers) so it’s clearly faster.

I should also point out that generating the primes up to p_n certainly is faster than factoring every number up to p_n, even if we imagine doing both by trial division: in order to discard a non-prime, we need only find a single factor, whereas to fully factor that number may require much more work.