Search This Blog

Loading...

Tuesday, August 24, 2010

Reduce: Not Scary

;; Three very fundamental operators in any sort of programming are map, filter
;; and reduce.
;; They represent the common programming tasks of transforming a collection,
;; selecting from it, and summing over it.
;; Most people think that map and filter are fairly obvious, but there seems to
;; be a certain amount of confusion about reduce.
;; But it's actually very simple in concept, and represents an easy idea.
;; Often one needs to loop over a collection, and store results in an
;; accumulator.
;; The simplest example of a reduction would be adding the numbers in a list.
;; Suppose the numbers are 1,2,3,4,5,6,7,8,9,10 and we want to find their sum
;; In C, or another naturally imperative language, we'd say:
;; int list[]={1,2,3,4,5,6,7,8,9,10};
;; int len=10;
;; int a=0;
;; int i;
;; for (i=0; i<len; i++){
;; a += list[i];
;; }
;; Using atoms to provide mutable state, we can do something similar in Clojure:
(deflst '(1 2 3 4 5 6 7 8 9 10))
(deflen 10)
(defa (atom 0))
(dotimes [i len]
(swap! a + (nth lst i)))
;; The value ends up in the atom a, just as in the C version.
;; In clojure, this looks slightly more complicated.
;; Partly because mutation in clojure is intrinsically more complicated, because
;; clojure is extremely concerned with thread safety, and so we need to allocate
;; and dereference atoms rather than mutating local variables.
;; And partly because C has very good notations for its fundamental operations.
;; But logically they're the same algorithm.
;; But I'd feel dirty writing this code in clojure, even though that would have
;; been a perfectly good piece of LISP in the sixties. It's just a feeling that
;; I have that it is better to avoid mutation unless it's actually necessary.
;; Even though the mutation-less algorithms are often harder to write, they're
;; often easier to debug and test.
;; A more natural way to accumulate over a list in clojure is the
;; loop-as-function-call, with accumulator and iterator as parameters:
(loop [a 0 i 0]
(if (= i len) a
(recur (+ a (nth lst i)) (inc i))))
;; This is much more idiomatic code in clojure, and it doesn't mutate any values
;; even though the variable-rebinding in the recur call produces a very similar
;; effect.
;; And here the final value is the value of the expression, which is nicer.
;; Of course, clojure's lists know when they are empty, so we don't need an
;; explicit loop counter.
;; So how about:
(loop [a 0 l lst]
(if (empty? l) a
(recur (+ a (first l)) (rest l))))
;; l moves along the list, while a accumulates the values.
;; It still looks a bit long-winded, but we can easily imagine that this is a
;; common pattern:
(loop [a _ l _]
(if (empty? l) a
(recur (_ a (first l)) (rest l))))
;; Where the blanks represent holes in the boilerplate we have to fill in.
;; It should be almost as common as the equivalent pattern:
;; a= _
;; for(i=0; i<_; i++)
;; {
;; a _= _ [i]
;; }
;; is in C.
;; Where in both cases we need to fill in the _ with the initial value of the
;; accumulator, the list to be accumulated over, and the operation to be
;; performed.
;; Pretty much the first law of programming is:
;; If you see a common pattern, you should name it and abstract it so it goes away.
;; The pattern is called reduce.
;; We need to fill in the blanks with the function to do the accumulating,
;; the initial value of the accumulator, and the list
;; Since we're reducing the list lst, using the operation +, and starting
;; with the value zero, we write:
(reduce + 0 lst)
;; reduce is clojure's natural way of expressing accumulation over a list
;; in the same way as the for-loop over += and ++ is C's
;; Here are some other examples
(reduce * 1 lst)
;; We use * instead of +, and start with 1 instead of 0
;; This produces the product of the numbers in the list.
;; In these cases where the order of the arguments doesn't matter
;; we can think of reduce as 'put the function between the values'
(reduce + 0 '(1 2 3 4 5 6 7 8 9 10)) ; (0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10)
(reduce * 1 '(1 2 3 4 5 6 7 8 9 10)) ; (1 * 1 * 2 * 3 * 4 * 5 * ........)
;; But this image is actually harmful when the ordering does matter.
;; What's really going on is that the operator is being used to feed values
;; into the accumulator one by one.
(reduce + 0 '(1 2 3))
;; proceeds like:
;; (+ 0 1) -> 1
;; (+ 1 2) -> 3
;; (+ 3 3) -> 6
;; If this seems confusing, even though you're happy with the C version,
;; think about the C loop.
;; They're actually just different ways of expressing the same idea,
;; and should look equally natural. Seeing how either one can always be
;; transformed into the other will help.
;; Here's an example where the order does matter:
(reduce conj '() '(1 2 3))
;; How do we think about this?
(conj '() 1) ; -> '(1)
(conj '(1) 2) ; -> '(2 1)
(conj '(2 1) 3) ; -> '(3 2 1)
;; So
(reduce conj '() '(1 2 3)) -> '(3 2 1)
;; Of course this simple reduction is so common that the pattern
;; (reduce conj '() _ ) already has a name
(reverse '( 1 2 3 ))
;; Here's the definition of reverse in the clojure.core source code!
(defnreverse"Returns a seq of the items in coll in reverse order. Not lazy."
{:added"1.0"}
[coll]
(reduce conj () coll))
;; An acceptable definition of reduce itself would be:
(defnmy-reduce [fn init coll]
(loop [acc init l (seq coll)]
(if (empty? l) acc
(recur (fnacc (first l)) (rest l)))))
;; This works on any collection that can be made into a sequence:
(my-reduce * 1 '(1 2 3)) ;; a list
(my-reduce * 1 #{1,2,3}) ;; a set
(my-reduce * 1 [1,2,3]) ;; a vector
;; The real reduce in clojure.core is an optimised version and can deal with all
;; sorts of collections efficiently, but in spirit it is just making every
;; collection into a sequence and then doing what my little skeleton above did.
;; It also has another feature, which is that if you don't provide an initial
;; value for the accumulator, then it takes the first element of the sequence as
;; its initial value, and accumulates over the rest of the sequence.
;; For operations which produce answers of the same type as their arguments,
;; this is often what you want.
(reduce * '(1 2 3 4)) ;24
(reduce + [1 2 3 4]) ;10
(reduce bit-xor '(1 2 3 4 5)) ;1
;; So why has this simple operation got a scary reputation?
;; I think it's because all the common cases are so useful that they have
;; already been further abstracted away, like reverse. So in fact you don't
;; meet it that often in practice.
;; Let's see if we can construct something more interesting:
;; Suppose you had a list of strings
(defstrlist '("fred""barney""fred""wilma"))
;; And wanted to count how many times each string occurs in the list.
;; We want an accumulator to keep the strings and counts in, and a function
;; which will take that accumulator, and a new string, and return the updated
;; accumulator.
;; The obvious accumulator is a map. We'd want the answer to be something like
{"fred" 2, "barney" 1, "wilma" 1}
;; So what function will add strings to a map?
;;In a rather naive and long-winded way:
(defnaddtomap [map string]
(let [oldval
(if (contains? map string)
(map string)
0)]
(assoc map string (inc oldval))))
;; Here's how we'd use it to count our list, starting from the empty map {}, and
;; using addtomap to add each string into the accumulator returned by each call.
(addtomap {} "fred") ;; {"fred" 1}
(addtomap {"fred" 1} "barney") ;; {"barney" 1, "fred" 1}
(addtomap {"fred" 1, "barney" 1} "fred") ;; {"fred" 2, "barney" 1}
(addtomap {"fred" 2, "barney" 1} "wilma") ;; {"wilma" 1, "fred" 2, "barney" 1}
;; So the reduce part is obvious, once you have addtomap
(reduce addtomap {} strlist)
;; But a real Clojure programmer would look at addtomap and think:
;; We can write (map string 0) instead of
;; (if (contains? map string)
;; (map string)
;; 0)
;; So a better version of addtomap would be:
(defnaddtomap [map string]
(let [oldval (map string 0)]
(assoc map string (inc oldval))))
(reduce addtomap {} strlist)
;; And now the let statement looks redundant, so let's say
(defnaddtomap [map string]
(assoc map string (inc (map string 0))))
(reduce addtomap {} strlist)
;; And then he might say
;; "since I'm only going to use this function here, why not make it anonymous?"
(fn [map string] (assoc map string (inc (map string 0))))
;; And now the reduce looks like:
(reduce (fn [map string] (assoc map string (inc (map string 0)))) {} strlist)
;; And, well, at this point, any reasonable man is going to think:
;; "Since I'm writing a one-liner, I might as well use the anonymous shorthand"
#(assoc %1 %2 (inc (%1 %2 0)))
(reduce #(assoc %1 %2 (inc (%1 %2 0))) {} strlist)
;; And if you already understand reduce and anonymous functions, and how maps
;; work, this is actually not too hard to understand.
;; In fact this is the version of the function that I originally wrote.
;; But I can see it might be a bit off-putting if you thought reduce itself was
;; scary.
;; Actually the obfuscation / pleasing terseness is all in the anonymous
;; function, and the behaviour of maps, and the reduce bit isn't scary at all.
;; Here's another deliberately obscure example, using a little structure as an
;; accumulator. See if you can figure out what it does using the above ideas to
;; unpack it. I'm using the destructuring notation to take the little structure
;; apart, and then the function modifies each part and puts them back together again
(reduce (fn[[c s] n] [(+ c n), (str s n)]) [0,""] lst)
;; The trick is to work out what the anonymous function does to the starting
;; value of the accumulator when it gets a value from the list.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Clojure's natural facility with abstractions and small functions allows
;; some truly terse code.
;; This little piece of code counts words in a file and orders them by popularity:
(sort #(< (%1 1) (%2 1))
(reduce #(assoc %1 %2 (inc (%1 %2 0))) {}
(clojure.contrib.string/split
#"\W"
(slurp "/home/john/hobby-code/reduce.clj"))))
;; With practice this sort of thing is actually readable. Promise!
;; But if I was actually writing it for someone else to read,
;; I'd probably split it up and give the bits names.
(let [filecontents (slurp "/home/john/hobby-code/reduce.clj")
words (clojure.contrib.string/split #"\W" filecontents)
wordmap (reduce #(assoc %1 %2 (inc (%1 %2 0))) {} words)
sortedwords (sort #(< (%1 1) (%2 1)) wordmap)]
sortedwords)
;; And if I knew the library, I'd remember that two of those little operations
;; actually already have names:
;; The idiom
( reduce #(assoc %1 %2 (inc (%1 %2 0)) {} .... )
;; which I used to find myself writing all the time, is such a useful thing
;; that it too has made it into clojure.core as the function frequencies:
(frequencies strlist)
;; and the sort with the comparator on the second elements can be replaced by sort-by, and second
(let [filecontents (slurp "/home/john/hobby-code/reduce.clj")
words (clojure.contrib.string/split #"\W" filecontents)
wordmap (frequencies words)
sortedwords (sort-by second wordmap)]
sortedwords)
;; And then I'd abstract away the word counting operations from the file reading part
(defnsorted-word-frequencies [string]
(sort-by second (frequencies
(clojure.contrib.string/split #"\W+" string))))
;; So now I can ask for the 10 most popular words:
(take 10 (reverse (sorted-word-frequencies (slurp "/home/john/hobby-code/reduce.clj"))))
;; which is also pleasingly terse, but I think more readable.

16 comments:

1. instaid of (empty? list) you should use (seq list) because (seq list) returns nil if the item is not a sequence. Its standard clojure style, it's wierd first because it turnes true and false but its better because you don't have to filter for nil sepretly.