2010 April 4

On powersets and folds

A while ago, somebody on reddit brought up a particularly cute and fairly well known definition of powerset:

powerset :: [a] -> [[a]]
powerset = filterM (const [False,True])

Unfortunately, it is also fairly well known that this is a very space-inefficient way of producing the powerset, for example evaluating length (powerset [1..32]) will allocate gigabytes of memory within seconds on modern computers, and is an effective denial-of-service attack against most default configurations.

Fortunately, there are solutions that avoid this space leak while maintaining the same style of definition. One solution is to use a revised definition of Control.Monad.filterM. Here is the standard definition:

These orderings are analogous to counting in binary, with the original definition having the least significant bit at the end of the list, whereas the revised versions have the least signficant bit at the beginning:

Both solutions have maximal sharing among the tails of the sublists, but the second solution arranges the sharing so that shared sublists appear adjacent to each other in the overall order, as illustrated by the asterisks. This allows length (powerset' [1..32]) to be evaluated in a very modest amount of space.

Note that this space leak is inherent to any solution that produces the first ordering, and has maximal sharing! If the first ordering is needed, it is better to recompute the sublists than to share them. As Cale Gibbard pointed out, one way to eliminate the sharing is to use another monad, such as the Logic monad.

Making powerset lazier

The definition of powerset can be generalized to produce all the finite sublists of a potentially infinite list. However, it is not obvious how to express this via tweaking filterM without breaking the abstraction, so let’s start with another efficient formulation based on concatMap.

On an infinite list, this definition gets stuck in an non-productive loop because powerset' must traverse the entire list before it returns anything. Note that on finite cases, the first sublist returned is always []. Formally, we can state this as powerset' xs == [] : tail (powerset' xs). It is sensible to extend this equality to the infinite case, due to the analogy to counting in binary. By making this substitution, we produce a lazier definition that can handle infinite lists, from which we can calculate a definition that’s more aesthetically pleasing and slightly more efficient:

Next, we can choose a new name for tail . powerset', such as nonempties, and substitute it through the definition. Finally, we can recover something equivalent to the original definition by appending the empty list to nonempties

For example, we can calculate the sum of every finite combination of the powers of 2:

powerfoldr (+) 0 (iterate (2*) 1) == [0,1,2,3,4..]

An equivalent “higher-level” definition of powerfoldr in terms of map, foldr, and lazyPowerset is as follows:

powerfoldr' f b = map (foldr f b) . lazyPowerset

This definition, however, recomputes the right fold over common sublists; the sharing provided by lazyPowerset is not preserved by this latter definition. The first definition has the same level of sharing as the original, reducing the number of applications of f by half. I found the first definition of powerfoldr to be quite helpful in solving Project Euler Problem 268; it certainly was not optimal for the purpose, but it was more than adequate.