Why I Program In Erlang

Erlang is a twenty-five-year-old programming language that has yet to win a popularity contest, and almost certainly will never win any medals for speed, let alone any tiaras for syntactic beauty. The language is slow, awkward, and ugly. Refactoring Erlang code is a pain.

Yet for almost five years, I have spent a large chunk of my free time programming in Erlang; at this point I've spent well over a thousand hours with the language. I've used the language to write, in rough chronological order, a CSV parser (don't laugh, I said chronological order), a template compiler, a object-relational mapper, a rich-text parser, an image resizer, a language pre-processor, a web framework, and a distributed message queue. What follows are my impressions of the language compared to other languages that I've used professionally (C, Java, Perl, PHP, Ruby, Objective-C, and JavaScript).

The good news about Erlang can be summed up at this: Erlang is the culmination of twenty-five years of correct design decisions in the language and platform. Whenever I've wondered about how something in Erlang works, I have never been disappointed in the answer. I almost always leave with the impression that the designers did the “right thing.” I suppose this is in contrast to Java, which does the pedantic thing, Perl, which does the kludgy thing, Ruby, which has two independent implementations of the wrong thing, and C, which doesn't do anything.

Take garbage collection. When it's time to collect garbage in other languages, the entire system has to stop while the garbage collector runs. This approach is perfectly fine if your computer program is supposed to run once, write some output, and then quit. But in long-running applications, such as desktop, mobile, or server programs, this strategy results in occasionally frozen UIs and slow response times. Erlang programs, on the other hand, can have thousands of independent heaps which are garbage-collected separately; in this way, the performance penalty of garbage collection is spread out over time, and so a long-running application will not mysteriously stop responding from time to time while the garbage collector runs.

Or take string concatenation. If you pop open the implementation of string concatenation in Perl, Ruby, or JavaScript, you are certain to find an if statement, a realloc, and a memcpy. That is, when you concatenate two strings, the first string is grown to make room for the second, and then the second is copied into the first. This approach has worked for decades and is the “obvious” thing to do. Erlang's approach is non-obvious, and, I believe, correct. In the usual case, Erlang does not use a contiguous chunk of memory to represent a sequence of bytes. Instead, it something called an “I/O list” — a nested list of non-contiguous chunks of memory. The result is that concatenating two strings (I/O lists) takes O(1) time in Erlang, compared O(N) time in other languages. This is why template rendering in Ruby, Python, etc. is slow, but very fast in Erlang.

No matter how blocking and concurrent your application logic is, it is impossible to make a blocking network call in Erlang, or to spawn multiple OS processes. This design decision makes it so that an Erlang server will never crash the operating system. Having lost many nights of sleep to overloaded operating systems at a previous job, I believe that Erlang's concurrency design is correct.

I mentioned that refactoring Erlang code is a pain. Fortunately, in my experience it is rarely necessary to refactor Erlang code in the same way the object-oriented code needs refactoring from time to time. In Erlang, each function is passed all the information it needs, and you get a compiler warning if it was passed any information it doesn't need. In some sense, refactoring is integrated into development; it is not a distinct activity requiring bountiful test coverage and several pots of coffee. Refactoring Java or Objective-C code usually becomes necessary because too many instance methods have been added to a class, and the developer must spend time figuring out which methods require which instance variables and how to best cut the carriage in half. This is simply not a concern in functional programming; moving a function to a different module requires very little hand-wringing and virtually no effort. “Refactoring” Erlang usually consists of breaking large functions down into smaller functions. There is not much mental effort involved; however, due to Erlang's syntactic peculiarities, it can be tedious converting anonymous functions to named functions. Perhaps a clever IDE will eliminate this tedium one day.

All data structures in Erlang are completely transparent. Knowing nothing about the library you are using, you can always inspect the contents of data structures at run-time. This feature greatly aids in debugging, and is a boon to old-fashioned hacking. It is easy to manipulate undocumented data structures in order to implement functionality that the original library author did not intend. Unlike object-oriented programming, you never need to worry about the original author renaming variables and breaking your subclass code; as long as the underlying data structure remains the same, your modifications will continue to work in Erlang.

I find that the transparency of data structures in Erlang makes programming much easier. In object-oriented programming, I am always worrying about what to name things; in Erlang, it usually doesn't matter, as the data structure is half the interface. If you have never programmed in Erlang, you probably have no idea what I am talking about.

And so we come to the bad news about Erlang: the language's benefits are back-loaded. That is, most of the language's benefits can only be appreciated after several years with other languages followed by several years with Erlang. It is certainly not a language for beginners. The syntax is strange to programmers hailing from the C diaspora. Functional programming is tough, and Erlang doesn't put any sugar on the pill. The graphics toolkits are primitive, and there are no fill-in-the-code computer games such as are found in introductory Java courses. Reading any non-trivial Erlang code requires a firm understanding of recursion, a kind of abstract thinking that many people find difficult.

Erlang is also lacking in libraries compared to other languages; in my experience, for any given task, there is zero, one, or at most two Erlang libraries available for the job. I am perhaps alone when I say this, but I actually like the fact that there are not many Erlang libraries available. If I need something done, I have the excuse to do it myself, and I often make discoveries that I would not have made otherwise. It sounds dumb but it is true. I can feel productive because I am doing something that no one has done yet, and along the way I have the freedom to try new approaches and make real innovations. I have learned more in the course of developing Erlang libraries than I ever learned stitching together other people's Ruby or C code. I program in Erlang purely for the enjoyment of solving problems and sharing my discoveries in well-engineered applications.

To a seasoned hobbyist programmer like myself, the only truly bad news about Erlang is that it is slow. For the server applications I have written, the speed of the language has not been an issue; the extra cost in CPU was more than made up by Erlang's correct handling of garbage collection, network I/O, and string concatenation in a concurrent environment. In the language of complexity analysis, Erlang programs tend to have a large constant out front but excellent asymptotic properties.

For the programmer who wishes to write fast programs using Erlang — the sorts of programs that start, run, write some output, and exit — there is hope on several fronts. A native-code compiler is available, and according to numerical benchmarks, it makes Erlang programs faster than Ruby, Perl, and PHP, albeit slower than Java and JavaScript. There is talk of a just-in-time native-code compiler, which might provide further improvements to execution time by gleaning information from the code execution itself and making appropriate optimizations. Finally, brave souls can write computationally intensive code in C via a NIF, with the important caveats that C code will block the Erlang scheduler (potentially negating Erlang's concurrency capabilities), and that parallel C code is famously difficult to write.

My own choice for writing fast programs in Erlang is an alternative technology which affords all the benefits of C code without compromising the integrity of the Erlang run-time. It is OpenCL, a C-like language introduced by Apple in 2008. Like Erlang, OpenCL easily takes advantage of all the processor cores on a given machine; unlike Erlang, OpenCL programs are very fast. In fact, OpenCL programs are usually faster than C programs, as OpenCL programs can take advantage of a processor's vector capabilities in a way that normally requires hand-tuned assembler code. OpenCL programs can be compiled and executed directly from Erlang code; in my view, it is a perfect technology for performing computationally intensive tasks (that is, running those inner loops) inside of a larger Erlang program.

By way of disclaimer, I have not actually used OpenCL inside an Erlang program. As I said, speed has not been a problem in the Erlang programs I've written. I do have some first-hand experience with OpenCL and have been quite pleased. I wrote a map projection library in OpenCL which is about 5 times faster than the state-of-the-art Proj.4 library (written in C). I also wrote an OpenCL library for doing multivariate statistics; I haven't benchmarked it against existing libraries, but I suspect it is faster by a similar margin. There are some peculiarities in writing OpenCL code, but it is my hope that one day all the world's tight loops will be rewritten in OpenCL and invoked from large programs written in Erlang.