Monday, February 25, 2013

;; Clojure's Reader is Unsafe
;; dependencies:
;; [org.clojure/clojure "1.4.0"]
;; [org.clojure/tools.reader "0.7.0"]
;; By a weird coincidence this week I wanted to read a file of data
;; coming from a web app, and I was about to use Clojure's reader to
;; do it. I remembered that there was some variable that I needed to
;; bind to stop it executing arbitrary code, so I went googling.
;; And found that there's been a fair bit of debate on just this topic recently.
;; Essentially, Clojure's reader is the thing that turns strings into
;; data structures when you're reading in a program or typing at the
;; REPL. And you have it available programmatically.
(read-string"(+ 1 2)") ; (+ 1 2)
;; Not so impressive, you say?
;; Consider:
(first"(+ 1 2)") ; \(
(second"(+ 1 2)") ; \+
(count"(+ 1 2)") ; 7
(type"(+ 1 2)") ; java.lang.String
(type (first"(+ 1 2)")) ; java.lang.Character
;; "(+ 1 2)" is a String, a flat data structure of seven Characters
(first (read-string"(+ 1 2)")) ; +
(second (read-string"(+ 1 2)")) ; 1
(count (read-string"(+ 1 2)")) ; 3
(type (read-string"(+ 1 2)")) ; clojure.lang.PersistentList
(type (first (read-string"(+ 1 2)"))) ; clojure.lang.Symbol
;; Clearly some sort of sea-change has occurred.
;; Even better, the value of a string is a similar string
(eval"(+ 1 2)") ; "(+ 1 2)"
(identical? (eval"(+ 1 2)") "(+ 1 2)") ; true
;; But the thing that read-string makes is a program which can run
(eval (read-string"(+ 1 2)")) ; 3
;; Another way to make such a program is:
(eval (list (quote +) (quote 1) (quote 2))) ; 3
;; In my innocence, I was going to use this magic to read some data out of a file that had been produced
;; by a web server.
(deffile"{:username leethaxor :score 5 :alignment :chaotic-evil}")
(:username (read-string file)) ;-> leethaxor
;; IT TURNS OUT THAT YOU SHOULD NEVER DO THIS, EVEN THOUGH THE ABILITY
;; TO DO THIS IS ONE OF THE THINGS THAT IS NICE ABOUT LISPS.
;; There are various reasons why:
;; Firstly, note that programs running on a computer can affect that computer
;; If you happen to have a file called precious.txt in the directory
;; where clojure is running, then you are strongly advised not to
;; execute any of the following code. Especially if it contains your
;; first novel and you haven't got round to backing up yet this year.
(clojure.java.shell/sh "touch""precious.txt") ;-> {:exit 0, :out "", :err ""}
(clojure.java.shell/sh "cat""precious.txt") ;-> {:exit 0, :out "", :err ""}
;; Secondly, note that there is special syntax to cause things to happen while
;; reading:
(read-string" #=(clojure.java.shell/sh \"rm\" \"precious.txt\") ") ;-> {:exit 0, :out "", :err ""}
;; That did not get turned into a data structure. That got executed. precious.txt is gone.
(clojure.java.shell/sh "cat""precious.txt") ;-> {:exit 1, :out "", :err "cat: precious.txt: No such file or directory\n"}
;; Luckily we can rebuild it:
(clojure.java.shell/sh "touch""precious.txt") ;-> {:exit 0, :out "", :err ""}
(clojure.java.shell/sh "cat""precious.txt") ;-> {:exit 0, :out "", :err ""}
;; And thirdly note that not everyone on the internet is on your side:
(deffile-of-evil" {:username #=(clojure.java.shell/sh \"rm\" \"precious.txt\") :score 5 :alignment :chaotic-evil}")
(read-string file-of-evil) ;-> {:username {:exit 0, :out "", :err ""}, :score 5, :alignment :chaotic-evil}
;; This is a thing that you're supposed to guard against when reading data structures.
(clojure.java.shell/sh "touch""precious.txt") ;-> {:exit 0, :out "", :err ""}
;; It is traditional to do that like this:
(binding [*read-eval* false] (read-string"(+ 1 2)")) ;-> (+ 1 2)
(binding [*read-eval* false] (read-string file-of-evil))
; -> RuntimeException EvalReader not allowed when *read-eval* is false. clojure.lang.Util.runtimeException (Util.java:170)
;; And in my innocence, I was about to actually do this, which would have been a mistake:
(defstring-of-evil"#java.io.FileWriter[\"precious.txt\"]")
(spit"precious.txt""precious content") ;-> nil
(slurp"precious.txt") ;-> "precious content"
(binding [*read-eval* false] (clojure.core/read-string string-of-evil)) ;-> #<FileWriter java.io.FileWriter@17b2712>
(slurp"precious.txt") ;-> ""
;; bugger
;; It turns out that even with *read-eval* bound to false, the Clojure
;; reader can be persuaded to execute arbitrary java constructors, and
;; some of those have side-effects.
;; I believe that that particular loophole is going to be fixed in the
;; next version of clojure, but the general thinking seems to be that
;; the reader is just intrinsically an unsafe thing and that you
;; shouldn't use it to read strings which you aren't sure are friendly.
;; Luckily, there is a library function that does what I thought the
;; reader would do, and that is apparently Absolutely Guaranteed by
;; Act of Parliament not to do bad things:
(require 'clojure.tools.reader.edn)
(spit"precious.txt""precious content") ;-> nil
(slurp"precious.txt") ;-> "precious content"
(clojure.tools.reader.edn/read-string string-of-evil)
;; -> ExceptionInfo No reader function for tag java.io.FileWriter clojure.core/ex-info (core.clj:4227)
(clojure.tools.reader.edn/read-string file-of-evil)
;; -> ExceptionInfo No reader function for tag = clojure.core/ex-info (core.clj:4227)
;; But which is still usable for the purposes of good:
(clojure.tools.reader.edn/read-string file) ;-> {:username leethaxor, :score 5, :alignment :chaotic-evil}
(clojure.tools.reader.edn/read-string "(+ 1 2)") ;-> (+ 1 2)
(eval (clojure.tools.reader.edn/read-string "(+ 1 2)")) ; 3
(slurp"precious.txt") ;-> "precious content"
;; So that's Super News!
;; We've got a function, read-string, which can turn strings into data structures while doing no harm whatsoever,
;; and another function, read-string, which can turn strings into data structures while potentially causing arbitrary side effects depending on the content of the strings.
;; Clearly nothing could possibly go wrong with this arrangement,
;; especially since read-string will be fixed in clojure 1.5 so that
;; binding *read-eval* to false around it will make it safe, as far as
;; anyone knows, even though the advice is still never to use it to
;; read untrusted strings, for some reason.

Tuesday, February 19, 2013

;; Clojure Emacs Eval Paste Keyboard Macros /
;; Generating your Regression Tests Automatically while Writing your Functions
;; There's a phenomenally useful feature of emacs when using clojure
;; and nrepl where you can evaluate the result of an
;; expression and paste the result directly into a buffer.
;; However my attempts to use it in keyboard macros have always led to frustration.
;; I asked about it on stack overflow:
;; http://stackoverflow.com/questions/14959155/how-can-i-make-emacs-keyboard-macros-work-properly-when-pasting-the-results-of-c#14960321
;; and a nice chap called sds helped me come up with the following work-around:
;; Say I've defined the function:
(defn<< [a b]
(if (zero? b) a
(<< (* 2 a) (dec b))))
;; And now I want to check that it works:
;; I evaluate the following two forms in emacs lisp (in the *scratch* buffer)
(defunclojure-eval-paste-test ()
(interactive)
(next-line)
(move-beginning-of-line nil)
(insert "(is= ")
(forward-sexp)
(insert " ")
(nrepl-eval-last-expression 't)
(sleep-for 0.1)
(insert ")"))
(global-set-key [f5] #'clojure-eval-paste-test)
;; Now I type:
(<< 1 0)
(<< 1 1)
(<< 1 2)
(<< 2 0)
(<< 2 2)
(<< 2 3)
;; And then I move point to just above the first line, and press f5 six times, and I get:
(is= (<< 1 0) 1)
(is= (<< 1 1) 2)
(is= (<< 1 2) 4)
(is= (<< 2 0) 2)
(is= (<< 2 2) 8)
(is= (<< 2 3) 16)
;; Which both reassures me that it does work, and provides the skeleton of a regression test for it!
;; There are more elegant ways of doing this, presumably, one being to
;; do as sds suggests, and code up the function using lower level
;; calls rather than the (sleep-for 0.1) thing that will obviously go
;; wrong if the eval takes too long.
;; And another way would be to fix it so that nrepl-eval-last-expression actually works in keyboard macros.
;; But this will do for now, and gets rid of a papercut bug that's been annoying me for ages.

Monday, February 18, 2013

;; Mathematics as an experimental science
;; Beta-Bernoulli distribution / Machine Learning
;; I'm reading Kevin Murphy's Machine Learning: A Probabilistic Perspective.
;; As one of the exercises in chapter three, I've just proved that:
;; "For the Beta-Bernoulli model, the probability of the data is the
;; ratio of the normalizing constants of the prior distribution and
;; the posterior distribution".
;; What I think this actually means is:
;; If you start with Coins which are biased, and the bias is chosen
;; uniformly (so that for instance a fair coin is just as likely as a
;; coin which comes up heads 1/3 of the time, or a coin which always
;; comes up heads).
;; And you pick out the fair ones by tossing them all 20 times, and
;; throw away all the ones that don't come up heads exactly 10 times
;; and tails 10 times.
;; Then, because your test has not quite guaranteed you fair coins,
;; but only coins which are quite a bit fairer than you started with:
;; You are ever so slightly more likely to see three heads (or three
;; tails) in three tosses than you would be if the coins were truly
;; fair.
;; In fact the bias is now the same as if the coins had been chosen
;; from the Beta(11,11) distribution.
;; And so the chances of getting three heads in a row is not, as you
;; might have naively expected, one in eight,
;; But rather B(14,11)/B(11,11), where B is the Beta function
;; And that works out to be G(14)G(11)G(22)/G(25)G(11)G(11), where G is the Gamma function.
;; And that is 13!10!21!/24!10!10!, where ! is the factorial function
;; And that is (11*12*13)/(22*23*24), where / and * need no introduction if you have got this far in the post.
;; And that is:
(/ (* 11 12 13) 22 23 24) ;-> 13/92
;; or
(float (/ 13 92)) ; 0.14130434
;; Which is to say, slightly more than one in eight.
;; And I am inordinately pleased with myself, not for having proved
;; this result, which was easy, but for having worked out what the
;; mysterious squiggles in the book might actually mean in practice.
;; And it occurs to me that Mathematics is in fact an experimental
;; science, which makes definite predictions about physical things,
;; such as the movements of electrons in my computer when I type the
;; following things:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Let us make coins with biases distributed uniformly between 0 and 1:
(defnmake-coin []
(let [theta (rand)]
(fn [] (if (< (rand) theta) :h:t))))
;; Here is one:
(defa (make-coin))
;; Let us test it
(frequencies (for [i (range 20)] (a))) ;-> {:h 11, :t 9}
;; It is no good. To the scrapheap with it:
(ns-unmap *ns* 'a)
;; Let us instead make a large number of coins:
(defcoins (for [i (range)] (make-coin)))
;; And throw away all the ones that do not satisfy our criterion
(deffair-coins
(filter (fn[coin] (= {:t 10 :h 10 } (frequencies (for [i (range 20)] (coin))))) coins))
;; Let us toss each of our fair coins three times:
(defresults (for [coin fair-coins] (frequencies (for [i (range 3)] (coin)))))
;; And add up all the results
(defcollected-results (drop 1 (reductions (fn[m f] (assoc m f (inc (get m f 0)))) {} results)))
;; And calculate the empirical distribution of those results:
(defempirical-distribution (map
(fn[m] [(float (/ (m {:h 3} 0) (reduce + (vals m))))
(float (/ (m {:t 3} 0) (reduce + (vals m))))])
collected-results))
;; It does seem to me that after a while, these numbers settle down to something near 14%
(doseq [e empirical-distribution] (println e))