Why we are choosing Clojure as our main programming language

Why we are choosing Clojure as our main programming language

[EDIT: It's been fun to follow the lively discussions on Hacker News. Head there for some good points regarding choosing a programming language to base your startup on]

[EDIT: Just noticed the traffic from Reddit. There's also a lively discussion there]

When we first set out to build the prototype for appvise.me I started hacking away in Python, a language I’d become familiar with over the last few years. I especially enjoy doing research programming with the REPL.

I had written a database loader to import Apple’s Enterprise Partner Feed (EPF) and a web crawler in Python and next up was the web interface. The Google AppEngine (GAE) with its support for Java and Python seemed to be a good match for a start-up as the cost-of-doing-business was close to zero for a low traffic web site. Perfect for the time being and by not confining oneself to the AppEngine Web Framework, webapp, and by abstracting the AppEngine Datastore away, it’s not all to hard to port the code from GAE as traffic picks up. A forked version of Django, the popular web framework even runs on it. What’s there not to like?

It all seemed like a smooth sailing but in the back of my head I was beginning to have doubts about my decisions. At the time I kept seeing more and more horror stories about GAE, high latency, hard-to-debug problems and I quickly realized a more solid infrastructure was needed. Abandoning GAE led me to revisit my earlier decision to use Python as well and I went into research mode – again.

Here I plan to go briefly through the options I evaluated:

Python While Python has always made sense to me I’ve always been annoyed by the build tools available with Python. While there are solutions like virtualenv, easy_install, pip and others I’m constantly reliving the scenario where you need to port your code to another machine and for some reason getting the correct libraries in the correct version just seems impossible. Library dependencies in Python are seriously hard stuff.

Then there is the GIL. For those of you unfamiliar with the Global Interpreter Lock (GIL) it is a locking strategy for interpreted languages that ensures that only one thread, the one holding the lock, can safely access objects. This is fine on a single-core CPU as there is really just one thread running at a given time. The only surfaces when you run your code on a multi-core CPU where literally hundres of cores can be working simultaneously and the GIL just prevents all but one core from accessing the data/objects at a time. With this fact in mind and knowing that Moore’s Law does not hold any longer and we have to start scaling horizontally, i.e. adding more cores and machines instead of just waiting for a faster machine, and developing concurrent solutions it seems controversial to choose Python as the concurrent programming language. Fortunately not all problems require concurrent solutions as they are either IO-bound or can be scaled by forking multiple processes.

Python 3.0 the next major version of Python will break backwards compatibility with Python 2.x. There is going to be a period where you want to move to Python 3.0 but one of the many libraries you use breaks under 3.0 so you postpone your migration to Python 3.0. This in turn minimizes the pressure on the library authors to port their code to the new version and creates a catch-22 scenario. We see the same thing with the migration to IPv6.

Perl
Perl is an old friend. Having used it to program the busiest website in Iceland I know it’s strengths and weaknesses. Since then mbl.is is being rewritten in Python (Django). Perl was not really one of the contestants but it has its place.

PHP
While there are enough programmers that know PHP (which is clearly a plus) there’s just not enough sex going on here. As with Perl, PHP really wasn’t in the loop.

Ruby
Ruby, Ruby, Ruby. Like Python it is hampered by the GIL. I like Ruby but even still I prefer Python over it.

Java
I’ve always had a massive respect for the JVM but every time I intend to pick up some java technology I always get swamped in XML configuration files that make my eyes bleed and class on top of classes just do do something very simple. Frustration has kept me away from Java.

Javascript
We have seen some crazy benchmarks for Node.js, the event-driven web server. It’s pretty interesting but I’m not sure I’d base my system on it because the library support is pretty poor and Node.js relies on cooperative threading. That means that if there’s a CPU intensive part a request is processing the incoming requests are blocked. So you really have to be careful how you write your code and split CPU intensive regions into smaller ones or you risk having your next visitor leave because your website is not responding. It’s neat to be able to write code for the backend and the browser in the same programming language but the ecosystem just doesn’t feel mature enough to base anything serious on it.

ErlangErlang is cool. I deeply enjoy programming simple solutions in the language and the possibility to hotload your code is awesome. The process-oriented actor-model along with the Erlang’simplementation of Supervisor trees enable highly distributed, concurrent and tolerant computing. Lately people have been using these features to create some pretty unique web frameworks and libraries in Erlang. String manipulation is one of Erlang’s weak spots. It’s slow and working with binary streams is sometimes just to much for someone coming from Perl.

HaskellI’m a functional language pervert and Haskell is the ultimate, pure, lazy evaluation, functional programming language. It has served as my functional drug since 2007 and I deeply enjoy the mathematical aspects of Haskell. I’ve used it to create a blogging system for my personal website and photo gallery (upload, image manipulation, categories, tag support, etc.) for my brother the photographer. Even still I just don’t have the guts to base anything on Haskell because there’s no middle ground; it’s either Haskell’s way or the highway and sometimes you just have to compromise.

Lisp/Scheme I’ve written a proxy server in Chicken Scheme which compiles to C code. The library support is OK for some purposes but as with Lisp there are some cool continuation-based web frameworks but general support is not great.

ClojureAnd then there was Clojure. I’ll go through the rationale of using Clojure below.

Overview

Clojure is a dynamic programming language that targets the Java Virtual Machine (and the CLR). It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multi-threaded programming. Clojure is a compiled language – it compiles directly to JVM bytecode, yet remains completely dynamic. Every feature supported by Clojure is supported at runtime. Clojure provides easy access to the Java frameworks, with optional type hints and type inference, to ensure that calls to Java can avoid reflection.

Clojure is a dialect of Lisp, and shares with Lisp the code-as-data philosophy and a powerful macro system. Clojure is predominantly a functional programming language, and features a rich set of immutable, persistent data structures. When mutable state is needed, Clojure offers a software transactional memory system and reactive Agent system that ensure clean, correct, multi-threaded designs.

Being a Lisp dialect, Clojure is functional and many of the functional idioms readily available. It has a great library support – the Clojure ecosystem is advancing quickly – but where it’s lacking you just find what you need in Javaland. This way enterprise quality debuggers and profilers are made available to you. Clojureruns on a battle tested virtual machine (JVM) and exposes a brilliant lock-free concurrency system that is ready for multi-cores.

Web programming

I am not a fan of all-in-one web frameworks like Rails or Django instead I prefer composable libraries. The HTTP request/response protocol has been abstracted in different programming languages (Rack for Ruby, WSGI for Python, Hack for Haskell) and Clojure is no exception. Ring is to Clojure what WSGI is for Python. Building on the functional foundation Ring enforces a composable pattern which is easily extendable with Ring’s middleware. Compojure is another library that build on top of Ring which facilitates routing and manipulating Ring’s functionality. Here’s a demo application written using the above mentioned libraries:

IDE

Clojure can be integrated to Netbeans, Eclipse, IntelliJ and other IDEs and editors. But since I live in Emacs I was happy to find out that Clojure has a Swank backend for Emacs SLIME. This allows me to start the JVM up once and then connect my editor to it when ever. I can even hotload code, replace running functions and so forth. For those who have not seen the magic of SLIME I recommend this video by the author of ClojureQL.

Big Data

The recommendation engine is going to base its recommendation on a massive amount of data. Fortunately others have been working on scaling solutions. A lot of the work is being done in Javaland and being able to write small Clojure wrappers around libraries like Hadoop/Cascading is a major benefit.

Clojure’s weaknesses

Is my decision the right one? What are the downsides of using Clojure you might ask. There are definitely a few. For me the usual Lisp parenthesis-madness is not so much of a problem because of great Emacs modules like paredit basically prevents unbalanced parenthesis.

Clojure is however a very young programming language and you never know what the support is going to be in 5 years time. Before a language gains a critical mass it’s relatively easy to abandon it if the author does not want to support further development. On top of that we have seen how Oracle, the new owner of Java, has been treating the Java community.

I have yet to hire a Clojure developer so that’s probably an area where I’m restricting us to a smaller group of programmers.

Conclusion

I hope I’ve given a rationale for my technical decisions that make sense to the reader. I’m quite happy with my decision and I hope others in a similar situation give Clojure a try – it’s well worth it.

Discussion

There may currently be a relatively small pool of people with Clojure skills, but those who get it are probably those who are experienced developers who’ve seen the pros and cons of other languages … a variation on Paul Graham’s Python Paradox http://www.paulgraham.com/pypar.html

For the time being ClojureQL is serving me very well. It’s a SQL-only tool but I have my data layer in pure Clojure (usually just 1 or 2 lines with CLojureQL) so it should be relatively easy to switch to another implementation targeting some other data container (like NoSQL).

I am a fan of lisp and use clojure for web development myself. It is very enjoyable environment to work, especially in web development.

But recently i started learning haskell web framework yesod. And i decided to use it for my next inhouse project. I must say that i enjoy it just as much as clojure. Except i have this constant feeling of safety. That when things compile they (surprisingly for me every time) just work. This is never the case with clojure. Usually you type in 2-3 lines then run it to see if it works, because you are never sure, and compiler does not help you much with dynamic types.

I like the haskell’s syntax more (even though i love s-expressions)
I would suggest you try some of the new haskell web frameworks (yesod, snap).

Btw that does not mean that i am going away from clojure. Haskell is still immature when things come to libraries, so i will be using clojure alongside with haskell (it is easy to do it for server side development) for a long time. 2 web servers interacting via http rest protocol. Poor man’s FFI

I totally understand you. I tried to grock happstack several times and each time was defeated by its overengineered complexity. What helped me breaking through the high wall this time were 2 web frameworks that were both simple and well documented. After some playing around a chose yesod, because it gives much more out of the box. But snap is also awesome and dead simple to understand and use.

I use emacs + slime with clojure too. And as you said it is extremely fast and enjoyable development environment. Actually i work in clojure much faster than in haskell. And i cannot describe you why on earth i still prefer haskell and want to switch to it. But tried both i know the taste. And haskell draws me like a magnet.

I guess i should warn you to stay for some time with clojure and do not touch haskell. Or you’ll get a haskell sickness like me.

Zed Shaw is one of my favorite programmers and his mongrel2 lead me to 0MQ which I use as main main messaging queue. I’ve seen his Lua Tir web framwork but I have never used it. Do you have any experience with it?

I’ve been programming in java since 1995. practically ever since it was invented. my graduation thesis was in java. my first startup job in java, 2nd startup job in java, 3rd job in java etc etc until i joined Sun Micro, where ofcourse it was all java, & then i worked at an IB for 5 years again all java. i’m now building a math-heavy visualization-heavy finance-heavy piece of software. again, java.
never in those 15 years did i once deal with xml configuration files. so i have no idea about your frustration with java.

It might just be that I’ve never immersed myself into the Java way but every time I try to set up a Java web server and try to deploy some application that I’ve written there seems to be a whole lot of XML-mangling going on.

@liamdevlin – maybe you should actually do some work with deployment, not only programming. Also, article says about web development – maybe you should try that The original post mentioned ‘classes over classes’ – that is really bad and counterproductive in java.

As for not using cores by language: pragmatic approach would be a ruthless website speed test. And clojure would fail that test now against python. Django uses memcached to speed up serving site – and that is not only multicore, but multi-host.

Also, that SQL dialect seems overengineered, sorry

Clojure also lacks for now any decent templating engine for MVC, team development.

I agree with you on the multi core point you made. I mentioned in the article that many concurrency problems are easily solved with process forking (or in your example multiple machines running the same code). These problems usually have that in common that the processes (or hosts) don’t need to communicate with each other directly (web sites often use the data store to share state). When your multi-threaded CPU-bound algorithm needs to share state or communicate with in the threads it’s nice to be able to lean on Clojures concurrency systems that don’t need any locking primitivies (well it uses them under the hood but it’s not exposed to the programmer).

web deployment doesn’t always mean html generation or servlets and tomcat.
Believe it or not, there are IBs deploying billion dollar per day ( yes that’s right billion with a B ) trading platforms entirely written in java, on the web, zero xml required. the gui is entirely swing, the protocol secure sockets, the deployment is just a jnlp file that points to a jar that contains all the bytecode. no xml anywhere.

for that matter, even retail outlets eg. etrade, has a java ui.
i do mostly math-finance stuff that works out very nicely in java. the computation is as good as native, given that floats and doubles map directly to their c compatriots. the visualization stuff…2d graphics mostly…java has gotten incredibly good at this in a crossplatform way. same code works on windows and solaris.

clojure is a “compiled” language that compiles down to bytecode. It sounds like an “interpreted” language, with the word “compiled” used in a bastardized connotation. Don’t get it twisted. Clojure, like Java, is not compiled, it is interpreted. The JVM is the interpreter. If you reduce the instruction set by removing the whitespace and adding a tokenizer, that is still not compiled. GCC is a compiler. GCJ is a compiler. Compiled means transposed into object / native executable code.

You’re spot on. I guess Rick Hickey, the creator of Clojure and author of the clause you’re referring to, is trying to convey the message that Clojure works on the same abstraction level (in hardware terms) as Java itself, i.e. gets compiled to byte code.

If you weren’t so adamant about it I wouldn’t feel compelled to correct you, but you’re just wrong. Compiling and compilers do not necessarily imply translating source code into native machine code.

javac is a compiler. It just so happens to compile Java code into executable bytecode which doesn’t happen to target a physical architecture.

From the wikipedia definition (if you need any further convincing):

“A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code).”

how are you managing schema changes (what we call “migrations” in the ruby world) ?
there is clj-liquibase, etc. but this is one of the two things that is worrying me ?
The other is authentication (are you using Sandbar?)

To be honest I’m not managing schema changes at the moment. I’ve been eyeballing clj-liquibase or if I’m not happy with that I might just create a small wrapper around some of the other Java-based migration tools, e.g. migrate4j. It’s really ease to write such wrappers, take a look at memfn which is useful when you want to treat a Java method as
a first-class fn.

I was unaware of sandbar until a few weeks ago so I rolled my own. It does not support authorization yet (all users are treated equally) but it’s literally 20 lines of Clojure. Add 30 lines and I got a support for Facebook authentication, albeit a very simple one. When my product demands more complex authorization I might migrate to sandbar.

Significant pieces of my startup work is done in Clojure…I’m finding it easy to write and debug language processing and information retrieval pipelines using it. My sense is that there are enough frameworks (it is after all running in the JVM) that you could efficiently do anything in Clojure that you could in Java…I find the syntax preferable to Java. I think it a startup the critical decision is just getting things up and tested/used as quickly as possible…whatever language/framework gets you there quickly with the least amount of pain is the language to choose.

Great write-up. I had the same concerns when exploring Java and after playing around with different frameworks, settled on Stripes (http://www.stripesframework.org) minimal configuration, almost nil XML and fast star. I’ve never looked back.

Apple now has Rhapsody as an app, which is a great start, but it is currently hampered by the inability to store locally on your iPod, and has a dismal 64kbps bit rate. If this changes, then it will somewhat negate this advantage for the Zune, but the 10 songs per month will still be a big plus in Zune Pass’ favor.

I find your quick dismissal of Ruby because of the GIL a bit sad. JRuby for example does not have a GIL and is a definitely a great interpreter. Also, Ruby is beyond the problem that you face with Python 3: 1.9.2 now is a supported option for all libraries that you should care about.

But in the end: good luck with clojure, its definitely a good choice as well!

The only thing that bugs me is that Clojure has to use a lot of java classes. I would say that the ability to use java libraries is good, but if it is dependent on them for event fundamental things like I/O, then it is not so good. Besides I like Emacs as well, but so far I can not find a way to make slime do auto completion for java libraries. This is a big deal for me, because in general names for java classes are too long to remember exactly.

I don’t feel like being exposed to Java classes to much. Regarding I/O there are clojure.contrib.duck-streams. This page gives some nice example how to use them. Besides, it’s pretty easy to write Clojure wrapper functions for java objects and classes.

I have code completion of Java libraries in my Emacs/SWANK setup. I’m using the ELPA version described here.

I have to say that I agree with you to some extent. Clojure is in general really well designed. Rich Hickey has really taste, IMHO, and while there are some decisions he has made that I would argue with (lack of reader macros) they generally have a rationale behind them- they’re not mistakes, but places where he has made a different set of trade-offs than I would have.

In fact almost everywhere I see somethung in clojure that I really don’t like it is something that is in some way a consequence of being on the JVM.

But it’s hard to argue with his decision to do things this way (again, IMHO.) One of the reasons that Clojure has a fairly consistent design and has captured a surprising amount of mind-share for such a young Lisp is that Hickey was able to get a useful version of it out by himself, in an very short period of time. While I have a lot of respect for him as a programmer it’s clear that he could not have done that without having all the machinery and libraries of the JVM already in place- no one could have.

So targeting the JVM, at least at first, seems like a very pragmatically sound decision to me. And it is possible to wrap things pretty easily to hide a lot of the ugliness of having to rely on Java libraries.

I do not know much about Closure, but I’m working on a brand new Project in Python and I love it. As for your objections to Python here is what I would say. I’m currently using virtualenv with pip and I have zero problem with packages and dependencies. From my understanding, the GIL is only a problem when you are using multithreading as your solution for concurrency problems. There are multiple other possible solutions for that problem like multiprocessing that work perfectly. Even if I had multithreading as a solution for this, I would not use it. I do not see a problem with Python 3 as I am not using it and I am not planning on using it in the near future. The whole Python community will eventually get to 3 and when it does, I think it will be pretty easy to meet it there.

Not that I do want to propagate Java as a language – the argument about XML in Java is something people iterate when they have never used Java. JSP/JSTL, Stripes, Wicket, Play, …. none of those use XML. You might get into contact with XML if you did use Hibernate some years ago – but first you would not want to use Hibernate or if you need to, go most likely with annotations.

“Java
I’ve always had a massive respect for the JVM but every time I intend to pick up some java technology I always get swamped in XML configuration files”