Database

Constructing Combinations using LISP

Source Code Accompanies This Article. Download It Now.

The main idea behind LISP is that any data structure can be constructed from lists. Consequently, John turns to this language when in need of a simple solution to complex combination problems.

November 1996: Algorithm Alley

John is a programmer/analyst at the Texas Department of Human Services. He can be reached at alexander.swartz@polaris.dhs.state.tx.us

Every Statistics 101 course begins with a description of how to count the number of permutations and combinations of a set. However, if you want to actually list these combinations, you may not find the answer in your old statistics book.

I've been interested in ways of efficiently finding only the combinations that satisfy certain constraints, as in this example: How many distinct selections of size nine can be made from twelve elements, three each of the colors blue, red, white, and green?

In 1980, I was testing a Fortran-based command/control system that continually overflowed a variety of internal arrays. Not surprisingly, I became quite interested when a LISP vendor advertised that their product never overflowed arrays. As a result, I developed an interest in LISP, and decided to share this solution to the combinations problem, which I developed in LISP.

A LISP Background

Despite its awkward syntax, LISP (which was designed by John McCarthy in the 1960s) is a remarkably simple language. The main idea behind LISP is that any computer data structure can be constructed from lists. Intuitively, a list is just an ordered sequence of items. In LISP, those items can either be fundamental "atoms" (numbers or string-like symbols) or other lists.

The advantage of this approach is that list processing is very simple. In LISP, there are ultimately only two things you can do with any list: Look at the first element, or look at the rest of the list. Through heavy use of recursion, this minimalist machinery allows you to easily process complex data structures.

Lists are written using parentheses, with individual items separated by space. For example, (a (b c)) is a list with two items. The second item, (b c) is itself a list with two items. The empty list is written (), and usually printed "NIL."

If a is a list, then (car a) is the first item in the list. (The name car stands for "contents of the address register," a throwback to the IBM 7090.) Similarly, (cdr a) is the entire list except for the first item. Combining these, (car (cdr a)) is the second item in the list.

As you can see, programs are, themselves, written as lists. The first item in the list is the function to execute, and the rest are arguments to the function. (Both car and cdr take a single argument.)

Listing One uses the control function cond to implement recursion. The cond function looks at each item in its argument in turn. These items are lists: The first half is a condition to test, the second is what to do if that test is true. Schematically, a cond statement looks like: (cond ( <test 1> <result 1> ) ( <test 2> <result 2> ) ...( <test n> <result n> ) ). Typically, the first line of the cond statement tests if a parameter is empty. The last line of the cond statement usually has a test value of t (which is always true); this is the equivalent of "else" in many other languages. Each line of the cond is a test-result pair.

Another often-used construct is (defun <name> (<args>) ( <body> ) ), which defines a function. Here's a simple example of its use. square, a function that squares the value passed to it, is defined as: (defun square (x) (* x x)). In the interactive mode, the statement > (square 7) would result in the immediate answer > 49.

Executing LISP is very memory intensive. Secular applications like the one presented here are only possible due to recent increases in the power of hardware. John McCarthy would probably not have been able to run this example using the hardware on which he developed LISP.

Xlisp (available electronically) is the particular LISP implementation I used here. Developed by DDJ contributing editor David Betz, Xlisp was designed to execute on the original Intel 8080-based PCs. Examination of the included source code reveals a very elegant style of C coding. The best reference I have found about the language is LISP, by Patrick Winston and Klaus Horn (Addison-Wesley, 1984).

Approach

The solution to the combinations problem I'm using here is conceptually quite simple. Rather than use complex bookkeeping, I simply construct all combinations without regard to whether those combinations already exist, then eliminate duplicates to produce the final list of solutions.

The combinations function is the heart of the algorithm. It recursively describes each combination of size elements from set as either:

The first element from set together with size-1 elements from the rest of set.

Size elements from the rest of set.

Put slightly differently, either the first element of set is in a particular combination or it isn't.

Implementation

Listing One shows three functions. Makeset eliminates duplicate items from a list. It recursively examines the list, comparing the first item at each step with the rest of the list. The first line of makeset's cond statement compares the parameter lat to the empty list using the built-in LISP function, Null. If it isn't, cond examines the second line, which uses another LISP built-in function, member, to see if the first element of the parameter list is repeated in the remainder of the list. If the first element is repeated, makeset calls itself to reduce the remainder of the list.

The final line of the cond is evaluated whenever an element is found that is not repeated later in the list. In that case, this unique element is held and attached--using cons--to the result of reducing the remainder of the list.

The combinations function takes two arguments: the size of the selection and the literal list of the colors. Note that the first two lines of the cond statement are tests for termination. This is because the final line of the cond statement is doubly recursive.

The essential idea behind the combinations function is that each combination falls into one of two categories--either it contains the first element of the parameter list or it doesn't. If it does, you need to look at combinations of size1 elements from the rest of the parameter list. If it doesn't, you need to look at combinations of size elements from the rest of the list.

The third function, distrib, is a "helper function" to the combinations function. It builds the combinations that contain the first element of the parameter list. Helper functions encapsulate much of the complexity in a large LISP function construct.

These functions are actually invoked from the line beginning with mapcar. Reading from the inside out (as is typical with LISP), you first look at all combinations of nine items from the specified list. The leading single quote makes the argument list a literal; otherwise the interpreter would try to evaluate it. The result of this is a list with all possible combinations, including duplicates. This list is reduced by makeset to produce a list of all unique combinations. Finally, mapcar applies the print function to each item in the list, generating the printout.

This powerful yet easy telescoping of functions characterizes LISP as a functional language.

I have seen prior implementations of the combinations algorithm in other languages, but they have been restricted to numeric data. This LISP implementation can process symbolic data just as easily.

At the top of the execution output in Listing Two is a time check and information concerning the initial state of the system. The middle of the output is the displayed solution to the problem. The final few output lines are the system state following execution. Of particular interest is the collections line. It says that five times during execution, the system ran out of memory and was forced to do garbage collection. Garbage collection is another defining LISP characteristic. A second time check is at the end of the execution. As you can see, the execution time was three minutes.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!