Selection

December 11, 2009

In statistics, the median is the smallest item that is greater than half the values in a set. Statisticians also define statistics such as quartiles, deciles and percentiles that show the value of an item relative to the other items in a set. A function known as select allows us to find the kth smallest item in a set; that is, the smallest item for which k of the items in the set are smaller.

A simple algorithm for select has us first partition the set into two subsets that are respectively smaller and larger than some randomly-chosen item from the set. If the smaller-than set has more than k elements, it must contain the desired element, so search recursively for the desired element in the smaller-than set; otherwise, subtract the size of the smaller-than set k, then search recursively for the desired element in the larger-than set.

Your task is to write a function that implements the selection algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.

Both select and partition have to change to fix the bug. Here is partition, where the (cdr xs) in the initialization of the loop ensures that at least one element is removed from the list at each iteration, thus forcing progress toward termination:

The variable name gt is changed to ge to reflect that equal elements are possible, and are (arbitrarily) placed in the "greater-than" partition. The order of the return elements from partition is also changed, so that when reading debug output from partition the elements are physically displayed in their proper order, a small point that nonetheless improves the program.