Saturday, April 11, 2015

Playing Clojure with a simple setup

I am always trying to find a scripting language which could replace Python for data preprocessing tasks. I love Python for everything except its performance [think about implementing a dynamic programming with two nested and busy loops]. There are many choices, F#/OCaml/Scala(succinct & typed!), go (at least typed…), etc. I also checked Clojure several years ago. But at that time, I knew little Lisp and I felt that Clojure is bundled with too much abstractions under a dynamic type system which would hurt performance. But recently I happen to see a Clojure code snippet which has type annotations (type hints in Clojure’s terminology)!

With type hints and other tricks, Clojure can be as fast as Java [1] (i.e., the same order of speed with F# and OCaml). Of course, the type hinted and highly imperative Clojure programs are even harder to read than their Java equivalents. But at least Clojure provides a way to achieve this goal in Clojure itself. Once we find the performance bottle neck, we don’t need any other tool, we just add type hints and if it does not work, simply write it in a more careful way in Clojure (e.g., keeping use unboxed primitive values and Java primitive arrays.) The good thing is that we don’t need to leave Clojure. Because I want to use Clojure as a quick scripting language, I don’t really want to create a project and setup a Makefile-kind of thing – I just want a single file that contains all the program. If we want to speed up Python, we can use Cython or writing C and use the compiled dynamic library in Python. This is just too much for the purpose of scripting. And the final product would be hard to deploy, think about somebody has no Cython installed, or transferring binaries to a new machine, etc. Just a nightmare. For Clojure, the compiler and the library are in the same jar (e.g. clojure-1.6.0.jar), the deployment of the script is really easy.

Set up Emacs with a Clojure Interpreter

No leiningen, no cider, no maven! Just download clojure.jar and two .el files. Then you are done! Actually you only need to have a Java compiler installed to have compile Clojure because Clojure compiler is written in Java. We really need to give a thump up for Rich Hickey for creating a production-quality language with so small size source code (less than 3MB for Clojure 1.6). That’s amazing!

Since I intend to use Clojure for data processing (or in a fancier phase, data science), an interpreter is a must. I need to program-and-test on the go [the same working routine I am under F#, R, and Matlab]. Unlike F# or Scala, the ideal IDE for Clojure is no IDE. Clojure programs needs no dot magic while for F# and Scala it is quite nice to type “.” and then find available functions under a module or object. So I have no plan to install the Cider. I use clojure-mode and inf-clojure, both of which are distributed as a single .el file. Or you can use (M-x list-packages, in Emacs 24 only) to install these two modules. I would suggest to install auto-complete mode and the dictionary for Clojure. The dictionary contains a long list of common Clojure keywords. But this is optional as once we get familiar with Clojure’s functions, we need no auto complete for keywords [Emacs’s M-/ is good enough]. For example, I never use auto complete minor mode when programming R scripts under ESS mode.

To start the interpreter, use M-x run-lisp. The shortcuts are the same with elisp modes (e.g. the *scratch* buffer). Two most useful ones:

C-x C-e: send the last expression to the interpreter. And expression is all the content between two parenthesis. So we can use this short cut to send multi-line function definition, or single-line printf to the interpreter.

C-x C-r: send the clipboard to the interpreter.

Run Clojure scripts from command line

REPL is great for playing with data and making sure every function works as expected. In the end, we usually put all functions into a script file which can run as a task. If we just use Java standard library and Clojure core library, we can simply type the following command:

java -cp clojure.jar clojure.main script.clj

and we can create .bat, an Linux alias or whatever to make the command short. Something like:

alias runclj=”java -cp /path/to/clojure.jar clojure.main”

However when our script is dependent on some libraries, managing the dependency can be tricky. I searched online, it seems that everyone is recommending leiningen to load a library or manage a script that contains multiple files. For some reason, I have to avoid using leiningen.

I find a simple solution. That is the good and old load-file command in LISP world (In Emacs, I sometimes use load-file to update my .emacs without restarting Emacs).

And it seems that the Clojure community has learnt from the practice of Javascript. That is, if a library is small enough, then distribute it as a single file! I checked several clojure libraries that I am interested: data.csv, clojure-csv, data.json. All these libraries are distributed as one or two .clj files!

The following script first loads two libraries: numeric-tower and json, and then do a test:

Notice that the last line of json.clj is: (load "json_compat_0_1"). load is similar to load-file, but it loads the .class files. Since we don’t pre-compile json_compat_0_1, we need to comment out this line and load json_compat_0_1 explicitly in our script.

Since I am usually dealing with large amount of data, I don’t care about the compiling and JVM startup time, which is negligible compared to the total time spent on number crunching.

My first Clojure programs

I’d like to thank Prof. Dan Grossman’s brilliant course at Coursera first. I never believe in books like Seven languages in seven weeks. Repeatedly writing three-line programs in different languages only give you an illusion that you’ve learned. Prof. Grossman’s course is truly “three languages in one semester”. One of his three languages is Racket/Scheme [the other two being ML and Ruby]. I did the two Racket assignments (Assignment 4&5). Assignment 5 is on how to implement a simple interpreter in Racket. That’s about all my training in lisp.

I solved some Hacker Rank problems using Clojure. The following includes my solutions to three problems. Two of them are dynamic programming problems. I focus on dynamic programming because the loops and array operations are slow in Python and I wish to test their speed in Clojure.

which shows how to use Clojure to solve the first 33 Project Euler problems. The page layout is so nice: on the left is the explanation, and the code is well aligned on the right. For each problem there is also a Docs section, which lists the new Clojure core functions that are introduced in the solution. By following all the 33 solutions, we may have seen usage examples of many functions listed on the Clojure cheat sheet.

Good readings on Clojure

I borrowed two Clojure books from a university library. But most of my readings are based on online materials. I list them below:

2 comments:

Usually I do not read post on blogs, but I would like to say that this write-up very forced me to try and do it! Your writing style has been surprised me. Great work admin.Keep update more blog. PHP Training in Chennai