Search This Blog

Monday, September 30, 2013

;; Efficiency and Progress III : How Close Can We Get to C Speeds?
;; When messing about trying to make clojure quick, it's often well to have
(set! *warn-on-reflection* true)
;; A kind commenter on my previous post (Bernard) made me aware of the
;; -march=native option for gcc which allows it to optimize for the
;; particular platform it's compiled on.
;; On my little netbook, this doubles the speed of the C program to
;; 8.6 seconds, so it looks like C can add two vectors of a million
;; integers to one another, and then add all the entries in the
;; result, in around 9 milliseconds.
;; On the other hand, another commenter, Dmitry points out that I
;; should make sure that my jvm is running in server mode, and that
;; leiningen isn't turning off the jvm's performance optimizations.
;; This is done by either by adding
;; :jvm-opts ^:replace ["-server"]
;; Or as a paranoid check, starting a clean clojure by hand:
;; java -server -classpath ~/.m2/repository/org/clojure/clojure/1.5.1/clojure-1.5.1.jar clojure.main
;; I also figure that if I'm going to accuse clojure of being slow, I ought to upgrade to java version 7.
(System/getProperty"java.vm.version") ;-> "23.7-b01"
;; So:
(time (reduce + (map + (range 1000000) (range 1000000)))) ;-> 999999000000
999999000000
"Elapsed time: 3238.342039 msecs""Elapsed time: 2925.501909 msecs""Elapsed time: 2079.815112 msecs""Elapsed time: 2031.237985 msecs""Elapsed time: 2023.951652 msecs""Elapsed time: 2095.66391 msecs""Elapsed time: 2031.429136 msecs";; You can see the jvm optimizing as it runs this code repeatedly
;; People keep telling me to use a benchmarking library called
;; criterium, but I reckon I don't need precision instruments to
;; measure a difference of more than two orders of magnitude, and it
;; sounds like one more complication.
;; If I'm wrong, I'm sure someone will point it out.
;; Let's call Clojure's time for this operation 2023 ms, the fastest
;; that it managed in several runs. I think that's actually fair
;; enough. Other random stuff that's going on is probably only going
;; to slow it down.
(/ 2023 8.6) ;-> 235.2325581395349
;; That's a speed ratio of about 250x
;; Any further suggestions about tuning the underlying tools gratefully received!
;; As a sanity check I also translated the C program into
;; similar-looking java, and that runs in 22secs, so the jvm should be
;; able to add and then reduce two integer arrays in about 22ms.
;; Just as a sanity check:
(import efficiencyandprogress)
(time (efficiencyandprogress/microbenchmark))
"Elapsed time: 22761.461075 msecs"
499999500000000
;; That also allows us a comparison of clojure to its underlying
(/ 2023 22.) ;-> 91.95454545454545
;; which looks about right for a dynamic language.
;; In fact, given lazy-sequences and immutable data structures, it's very good indeed!
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; I've also had various suggestions about how this little
;; microbenchmark could be speeded up.
;; Another commenter, Mark, proved that Clojure can do the actual adding up quickly:
(time
(loop [i 0
a 0]
(if (= i 1000000) a (recur (inc i) (+ a i i))))) ;-> 999999000000
"Elapsed time: 56.808898 msecs";; But of course that's not really fair, since it's not adding big
;; vectors but can potentially do all its calculating in the processor
;; itself. The graph algorithms I'm playing with won't be so amenable.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; After a bit of reading round, it appears that the only way to get
;; this sort of thing to run fast in clojure is to use java's
;; primitive arrays, essentially trying to recreate the java version
;; inside clojure.
;; It's quick to create these arrays. (Not that that really matters)
(time (defa (int-array 1000000)))
"Elapsed time: 5.553077 msecs"
(time (defb (int-array 1000000)))
"Elapsed time: 5.758837 msecs";; And you can iterate over them using dotimes and aset, C-style.
;; Do this naively and it's insanely slow
(time (dotimes [i 1000000] (aset a i i)))
;; this is where *warn-on-reflection* pays off : Reflection warning: call to aset can't be resolved.
"a long time....";; But what a difference a type-hint makes!
(time
(let [a ^ints a]
(dotimes [i 1000000] (aset a i i))))
"Elapsed time: 68.792057 msecs";; However, we're still in clojure, which is nice if you want to do
;; pretty much anything other than fast arithmetic on arrays. Arrays
;; can be seqs just like any other collection.
(take 10 a) ;-> (0 1 2 3 4 5 6 7 8 9)
;; dotimes is a macro, so we can have a look at what it's doing
(macroexpand '(dotimes [i 1000000] (aset a i i)))
;; And by analogy construct loops of our own
(time
(let [a ^ints a b ^ints b]
(loop [i 0]
(when (< i 1000000)
(aset a i i)
(aset b i i)
(recur (unchecked-inc i))))))
"Elapsed time: 92.136649 msecs";; Here we do a vector addition (eek! mutation!)
(time
(let [a ^ints a b ^ints b]
(loop [i 0]
(when (< i 1000000)
(aset b i (+ (aget a i)(aget b i)))
(recur (unchecked-inc i))))))
"Elapsed time: 152.685547 msecs"
(take 10 b) ;-> (0 2 4 6 8 10 12 14 16 18)
;; and reduction
(time
(let [b ^ints b]
(loop [i 0 sum 0]
(if (< i 1000000)
(recur (unchecked-inc i) (+ sum (aget b i)))
sum))))
"Elapsed time: 106.103927 msecs"
999999000000
;; So let's bite the bullet and see whether we can do the actual computation that C and Java did.
;; I'm estimating about 4 minutes rather than 9 seconds, but it's progress.
;; To be fair to clojure, we'll allow it to skip the overflow checks
;; that neither C nor Java are bothering with.
(set! *unchecked-math* true)
(defN 1000000)
(defa (int-array (range N)))
(defb (int-array N))
(time
(let [a ^ints a b ^ints b N ^int N]
(loop [count 0 sum 0]
(if (= count 1000) sum
(do
(println count sum)
(dotimes [i N] (aset b i (+ (aget a i)(aget b i))))
(recur (inc count) (+ sum (loop [i 0 ret 0]
(if (= i N) ret
(recur (unchecked-inc i) (+ ret (aget b i))))))))))))
;; There's clearly something unpleasant in some woodshed somewhere, since I get this warning:
;; recur arg for primitive local: sum is not matching primitive, had: Object, needed: long
;; Auto-boxing loop arg: sum
;; but actually, this does ok!
"Elapsed time: 174125.982988 msecs"
250249749750000000
;; The right answer, 9x slower than Java, 20x slower than C. A vast improvement.
;; At this point I'm starting to think that this might give me a way
;; of avoiding the grisly prospect of dropping into C or Java whenever
;; I have some graph algorithm to run.
;; The code is completely unreadable, and almost unwriteable, but it's
;; possible that we can do something about that. We are in a lisp
;; after all.
;; Still, currently I'm thinking, after what's now many days of
;; wrestling with this problem :
;; Difficult and Time Consuming to Write
;; Fragile (whether you get the speedups seems to depend on the detailed structure of the expressions)
;; Impossible to Understand
;; Still Pretty Slow
;; Can anyone show me a better way?
;; I have tried using the visualvm profiler, which I always found very
;; useful when trying to speed up clojure code in version 1.2.
;; Getting it running was an odyssey in itself, and it hasn't given me
;; any helpful insights yet.
;; I'm probably just not understanding what it's doing.

Wednesday, September 25, 2013

/* A guest language on this blog. A welcome, please, for C *//* The reason that I am off on this particular one at the moment is
because I recently waited 3 hours for a clojure program to
terminate, after about a day trying to get it to run at all. *//* When it became apparent that it was not going to finish any time
soon, I hacked together an answer in C, using what should have been
a less efficient algorithm that I had come up with while
waiting. *//* That program took about 15 minutes to write and 65 seconds to run,
and got the correct answer. *//* That comparison is entirely unfair to both clojure and C in all
sorts of ways, but if I am going to spend time getting clojure to
run at C-ish speeds, I need to know what I should be aiming for. *//* This program is what I am using as a comparison for (reduce + (map + _ _ )) *//* To make sure that clever compilers and runtimes aren't doing any
sneaky dead-code elimination, it is actually doing some sort of
computation. But it is mainly mapping and reducing. Lots.*/#include<stdio.h>#defineN 1000000
inta[N];
intb[N];
intmain(void)
{
inti, count;
longlongsum=0;
for (i=0; i< N; i++) {
a[i]=i;
}
for(count=0; count<1000; count++){
for (i=0; i< N; i++) {
b[i]+=a[i];
}
for (i=0; i< N; i++) {
sum+=b[i];
}
}
printf("sum=%lli\n", sum);
}
/* gcc -std=gnu99 -Ofast efficiencyandprogress.c -o efficiencyandprogress && time ./efficiencyandprogress *//* sum=250249749750000000 *//* real 0m16.053s *//* user 0m15.992s *//* sys 0m0.032s *//* So it looks as though adding one array to another and then adding up all the values in an array takes about 16ms in total.*//* That's 16ns per array entry which looks sort of OK on my little netbook, which boast an Intel Atom CPU N455 running at 1.66GHz with a 512kb*//* I'm hoping there's enough complexity here that the compiler actually has to run the program rather than taking short cuts*//* But just as a check, here's the code running with gcc in 'do exactly what I say so I can debug it' mode.*//* gcc -std=gnu99 -O0 efficiencyandprogress.c -o efficiencyandprogress && time ./efficiencyandprogress *//* sum=250249749750000000 *//* real 0m27.850s *//* user 0m27.692s *//* sys 0m0.060s *//* This produces a small constant factor speedup, as expected if the two versions are doing the same work. */

A warning: this post is considered stupid and harmful by a large number of people.

From the comments:

Anonymous May 28, 2014 at 7:06 AM

I love that this post shows up so highly in Clojure searches. It's a somber reminder to not write about things I know nothing about (let alone with a proclamation like the author uses).

And my reply:

Refute me then, and I will make another post saying I was wrong. I wrote this after two days trying to get a clojure program to complete while doing an algorithms course. I eventually rewrote it in C and it ran in a few seconds on the same hardware.

Once upon a time I wrote posts claiming that clojure was very fast if you wrote it cleverly. No one complained then and they did get a lot of publicity. Nowadays I've lost the knack. If anything it seems that compromises made to speed up the performance have resulted in un-optimizability and I understand that the official advice is to use Java for the tight loops.

But I have no wish to damage the public reputation of clojure, which I love and use all the time. What search makes this post show up on the front page?

;; Efficiency and Progress
;; Are ours once again
;; Now that we have the neut-ron bomb
;; It's nice and quick and clean and ge-ets things done...
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; When you program in Clojure, you get the raw speed of assembler.
;; Unfortunately, that is, assembler on a ZX81, running a Z80 processor at 4MHz in 1981.
;; If anything, that comparison is unfair to my old ZX81. Does anyone
;; remember '3D Invaders', a fast and exciting first person shooter /
;; flight simulator that ran in 1K of RAM *including memory for the
;; screen*?
;; Once upon a time, I had the knack of making clojure run at the same
;; speed as Java, which is not far off the same speed as C, which is
;; not far off the speed of the sort of hand-crafted machine code which
;; no-one in their right mind ever writes, in these degenerate latter
;; days which we must reluctantly learn to call the future.
;; But I seem to have lost the knack. Can anyone show me what I am doing wrong?
;; At any rate, it isn't too hard to get it to run at something like
;; the real speed of the machine, as long as you're prepared to write
;; code that is more like Java or C than Clojure.
;; So here are some thoughts about how to do this.
;; Which I offer up only as a basis for discussion, and not in any way
;; meaning to stir up controversy, or as flame-bait or ammunition for
;; trolls or anything of that sort.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Clojure is very slow:
(time (reduce + (map + (range 1000000) (range 1000000))))
"Elapsed time: 5316.638869 msecs";-> 999999000000
;; The greater part of its slowness seems to be do with lazy sequences
(time (defseqa (doall (range 1000000))))
"Elapsed time: 3119.468963 msecs"
(time (defseqb (doall (range 1000000))))
"Elapsed time: 2839.593429 msecs"
(time (reduce + (map + seqa seqb)))
"Elapsed time: 3558.975552 msecs";-> 999999000000
;; It looks as though making a new sequence is the expensive bit
(time (doall (map + seqa seqb)))
"Elapsed time: 3612.553803 msecs";-> (0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 ...)
;; Just adding things up is way faster
(time (reduce + seqa))
"Elapsed time: 315.717033 msecs"
499999500000
;; I wondered if there was a way of avoiding lazy-seqs
(time (defveca (vec seqa)))
"Elapsed time: 470.512696 msecs"
(time (defvecb (vec seqb)))
"Elapsed time: 374.796054 msecs";; After all, 'use the right data structure for the problem' is pretty much lesson 1, and if vectors are not a good data structure
;; for this problem, then what is?
;; But it seems that despite the speed of making the vectors, it doesn't help much when we do our thing.
;; In fact it's a bit slower
(time (reduce + (mapv + veca vecb)))
"Elapsed time: 4329.070268 msecs"
999999000000
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; So lets say 3600ms to add together two arrays of 1000000 elements and sum the result.
;; In C on the same machine (my little netbook with its 1.66GHz Atom
;; and 512kb cache) this seems to take 16 ms, being 8ms for the map
;; and 8 ms for the reduce. I'm assuming that that time is mostly
;; spent waiting on the main memory, but I may be wrong. Who knows how
;; these things are done?
(/ 3600 16) ;-> 225
;; So shall we call this a 225x slowdown for the natural expression in
;; the two languages of mapping and reducing?
(time (reduce + seqa))
"Elapsed time: 358.152249 msecs"
499999500000
;; If we just look at the reduction, then that's
(/ 358 8.) ; 44.75
;; So around 50x
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; Union-Find I
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; In the last post, I showed Kruskal's algorithm for finding Extremal
;; Spanning Trees in a weighted graph.
;; If we were to try to scale that algorithm, we'd find that it was
;; quadratic in the number of vertices in the graph.
;; The problem is that we're repeatedly searching lists of sets to see
;; whether two things are connected or not
;; So we could speed it up a lot if we could find a data structure
;; that is good at that sort of thing
;; Specifically, we'd like to be able to ask whether a is joined to b quickly,
;; and we'd like to be able to quickly modify the relation when we decide to join a to b
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Union-Find is a data structure specifically designed for keeping
;; track of partitions and equivalence relations.
;; A partition is a division of a large set into smaller sets
;; It can also be viewed as an equivalence relation
;; a is joined to b, if and only if a and be are in the same subset in the partition.
;; Consider our set of cities
(defcities ["London""Birmingham""Sheffield""Bristol""Leeds""Liverpool""Manchester"])
;; We'll make that into a hash of cities, where each city points to itself
(definitial (apply hash-map (mapcat (fn[x][x x]) cities)))
;; We'll interpret that map by saying 'All things that point to the same thing are in the same component'.
;; So in our initial map, where everything is pointing to itself, nothing is joined.
;; Say we want to assert that Liverpool ~ London.
;; Then we want to make everything that points to Liverpool point to whatever it is London points to.
(defnjoin [union-find [a b]]
(let [leader1 (union-find a)
leader2 (union-find b)]
(into {} (map (fn[[k v]] (if (= v leader1) [k leader2] [k v])) union-find))))
;; Let's connect Liverpool to London and London to Bristol, and Manchester to Sheffield
(reduce join initial [["Liverpool""London"] ["London""Bristol" ] ["Manchester""Sheffield"]])
;-> {"Liverpool" "Bristol", "Sheffield" "Sheffield", "Manchester" "Sheffield", "Birmingham" "Birmingham", "Bristol" "Bristol", "London" "Bristol", "Leeds" "Leeds"}
;; Notice that Liverpool, Bristol, and London all point to Bristol (call that the Bristol Group)
;; And that Sheffield and Manchester form a group with Sheffield as its leader (call that the Sheffield group)
;; Whilst Leeds and Birmingham stand in splendid isolation, leaders of their own groups.
;; Now we can easily and quickly check which places are connected
(defnjoined? [union-find a b]
(= (union-find a)(union-find b)))
(joined?
(reduce join initial [["Liverpool""London"] ["London""Bristol" ] ["Manchester""Sheffield"]])
"Bristol""Liverpool") ;-> true
(joined?
(reduce join initial [["Liverpool""London"] ["London""Bristol" ] ["Manchester""Sheffield"]])
"Bristol""Manchester") ;-> false
;; But that's only half the problem. When joining cities, we still need to scan the whole map.
;; That means that if we're mainly joining things, rather than querying them, our performance is still poor.
;; So we should make each leader keep a list of the things that point to it.
;; Let's start again!

Wednesday, September 18, 2013

;; The Knapsack Problem
;; Suppose you've got twelve pounds
(defbudget 12)
;; And there's a thing that costs a pound, but is worth 20.
;; And another thing that costs 3, but is worth 30
;; And another thing that costs 3, but is worth 21
;; And a thing that costs 6 but is worth 40
(defthings (map (fn[[c v]] {:cost c :value v}) [[1 20][3 30][3 21][6 40]]))
(defnprice [things] (reduce + (map:cost things)))
(defnevaluate [things] (reduce + (map:value things)))
(evaluate things) ;-> 111
(price things) ;-> 13
;; So there's 111's worth of things going for 13, but you can't buy everything.
;; What do you buy?
(defnvalue [sorted-things]
(evaluate
(let [order sorted-things
baskets (reductions conj '() order)]
(last (take-while #(<= (price %) budget) baskets)))))
;; Well, if you're a cynic
(value (sort-by:cost things)) ;-> 71
;; Then you come away with 71's worth
;; And if you're an idealist
(value (reverse (sort-by:value things))) ;-> 91
;; Then you do way better with 91
;; A more cunning approach is to take things in order of their price/value ratio
(value (reverse (sort-by (fn [{:keys[value cost]}] (/ value cost)) things))) ;-> 71
;; Sadly that does worse than the approach that only pays attention to the value.
;; So it seems that out of the three natural-seeming 'greedy algorithms', the best solution is 91
;; Yet another approach is to exhaustively search the space of possibilities:
(defnsubsets [things]
(if (empty? things) '(())
(let [srt (subsets (rest things))]
(concat (map #(cons (first things) %) srt) srt))))
(reverse (sort-by second (for [i (subsets things)] [(price i) (evaluate i)])))
;-> ([13 111] [12 91] [10 90] [10 81] [7 71] [9 70] [9 61] [7 60] [6 51] [4 50] [4 41] [6 40] [3 30] [3 21] [1 20] [0 0])
;; Which tells us that the best combination is unaffordable, so we
;; have to settle for the second best, which is paying 12 to get 91,
;; which the idealist has been trying to tell us all along.
;; But the idealistic approach is unlikely to work in the general case.
;; Consider a thing which is worth a lot, but horribly expensive, and
;; lots of other things which are worth a fair bit and dirt cheap.
;; Personally my money would have been on the 'buy things in order of
;; price/value ratio' approach, but we above that that fails in at
;; least one easy case.
;; So it appears that if we are faced with a problem like this, ( and
;; there are many such problems ), then we are doomed.
;; Exhaustive search is not feasible once you've got more than a very
;; few items, and yet the various greedy algorithms above get the
;; wrong answers.
;; And yet if you write down a knapsack problem like this, you will
;; not find it appallingly difficult to pick the best arrangement.
;; There is a certain tradition at this point of exclaiming 'The HUMAN
;; BRAIN is performing a COMPUTATION INFEASIBLE for a CLASSICAL
;; COMPUTER', and then going on to derive your favourite philosophical
;; position on the nature of consciousness, which will miraculously
;; turn out to be whatever it was you thought before you contemplated
;; the problem in question.
;; But wait ...