Hi, I'm Tony Arcieri. You may remember me from such software projects as Celluloid, Reia, and Cool.io...

Tuesday, July 26, 2011

The Trouble with Erlang (or Erlang is a ghetto)

This is a blog post I have been meaning to write for quite some time. I lament doing so because I've made a considerable time investment into the Erlang infrastructure and really love some of its ideas. Erlang has done a great and still unique job of synthesizing a number of concepts in a very interesting way. But after using the platform in various capacities for some 4 years now, there are some glaring issues I think need to be called out.

Records suck and there's no struct/map data structure

Erlang has a feature called "records" which uses the preprocessor to give you something akin to a struct or map, i.e. a way to access named fields of a particular object/term within the system. As far as I can tell, there's pretty much universal agreement within the community that this is a huge limitation, and several proposals have been made to remedy the problem. The requested feature has typically been referred to as a "frame", and several proposals for implementing frames have been floating around for several years. Yet no action has been taken on the problem.

So why doesn't Erlang have frames? While Erlang is an open source project, its implementation and release cycle are managed by Ericsson, the company that created it, and Ericsson just doesn't seem to care. I'm not sure what Ericsson's priorities are when it comes to adding features to Erlang, but in my opinion they're doing a worse job of engaging the community than Oracle has been doing with Java. I hate Oracle as a company, but so far it feels like they've actually done a fairly good job managing Java development and moving Java forward. I can't say that at all with Ericsson, and frames are the quintessential example of this.

Where Azul was scaling up to 768 CPUs in 2007, Erlang was crapping out around 15 CPUs in 2009. For everything Erlang had to say about the importance of immutability and messaging in concurrent systems, and despite Joe Armstrong's promise that "your Erlang program should just run N times faster on an N core processor," it turns out that on the Erlang VM the N core processor promise had an upper bound of around 15.

Why is this? Erlang implements its own memory allocator and can't take advantage of libraries like tcmalloc to provide better multithreaded heap management. I can't fault a language VM like BEAM for doing this save for the fact that what Erlang provides is relatively crappy.

Erlang has done a fairly decent job given the constraints it was working within. Erlang wanted to provide a soft realtime system, and managed to create one that works on commodity architectures, unlike the Azul Vega appliances which require custom hardware. However, Azul has managed to port their version of the JVM to x86 hardware with their Zing Architecture, which wraps the JVM in a separate runtime container which uses software transactional memory to replace the hardware transactional memory found on the Vega appliances. It's higher overhead but provides similar guarantees. Java also provides the RTSJ specification for building realtime systems in Java.

However, I think everything I just said is moot for the majority of applications. People building messaging systems want the best performance possible but don't typically have software realtime constraints. The Erlang VM's approach to soft realtime made a design decision which hampers its messaging speed, namely the use of separate heaps, which requires messages be copied from one heap to another. This means the Erlang VM does not provide zero-copy messaging. Every time you send a message from one Erlang process to another, some amount of data must be copied.

Erlang has partly mitigated this problem by providing a separate shared heap for binaries, which are the Erlang type for arbitrary blobs of binary data. This means if you ensure the majority of data you move around doesn't contain anything of significant size except binaries, perhaps this won't be a problem. However, if you're moving large collections of numbers around (Erlang's strings-as-lists-of-integers come to mind), messaging will be comparatively slow compared to a zero copy system.

Various solutions to this have been proposed for BEAM, such as switching from a shared-nothing heap to a shared heap or a hybrid heap (where message-passed objects are copied once), however the Erlang garbage collector is not suitable for managing shared/hybrid heaps and would need to be rewritten for the task, and nobody has managed to get the shared/hybrid heaps working with Erlang's SMP scheduler, or rewritten the garbage collector to be more suitable to the task of managing a shared/hybrid heap.

Erlang has a "JIT" compiler called HiPE, which is mostly hype. I put JIT in quotes because HiPE is mostly an Erlang-to-native-code compiler with a limited set of backends which does a pretty bad job of optimizing and can't use runtime profiling information to improve the quality of the native code it generates in the way JIT compilers like HotSpot are able to. Calling HiPE a just-in-time compiler is a stretch as it is for most part an ahead-of-time native compiler for Erlang. The quality of native code produced by HiPE can be so poor that it's often outperformed by the userland bytecode interpreter implemented in BEAM.

HiPE can perform a very limited set of optimizations. In particular, Erlang code is factored into modules, and HiPE's inliner is unable to inline natie code across modules. This is due to HiPE's lack of a deoptimizer (a.k.a. deopt), or a way to translate JITed code back into bytecode, which is necessary in general but particularly necessary in Erlang for cases like hot code swapping. Deopt support is a feature of many JIT compilers in languages more popular than Erlang, most notably the HotSpot compiler on the JVM. Google's V8 virtual machine for JavaScript added deoptimization support as part of their "Crankshaft" compilation infrastructure.

Erlang isn't general purpose

Erlang hates state. It especially hates shared state. The only facility provided by the language for dealing with shared state in Erlang is called "Erlang Term Storage" and provides a Judy array that several Erlang processes can talk to. The semantics of ETS are fairly awkward and using it directly is difficult. Erlang has a baked-in database called Mnesia which is built on ETS. Mnesia's performance characteristics aren't great but it provides a friendlier face for ETS. These are the only solutions to shared state baked into the language.

What should you do if you want to deal with a shared-state concurrency program in Erlang? The general advice is: don't. Erlang isn't designed for solving shared-state concurrency problems. If you encounter a shared state concurrency problem while developing your Erlang program, sorry, you picked the wrong language. Perhaps you should move along... and Clojure offers you some great ways to tackle shared state concurrency problems.

The syntax is atrocious

I think this one goes without saying. That said...

Let me come at this from a different angle than you're probably expecting: I've recently started working with Clojure, and I have to say, I really think Erlang would've been a lot better off with a Lisp-like syntax than a Prolog-inspired syntax. To-date Erlang is the only popular language with a Prolog inspired syntax and all of the awkward tokens and gramatical constructions make me wish it just had a simple Lispy syntax. This has been implemented in Robert Virding's Lisp Flavoured Erlang, which is very cool and worth checking out.

That opinion might come as a surprise, because the main project I was developing in Erlang was Reia, a Ruby-like syntax and runtime for Erlang. I've discontinued this project, for many reasons, one of which is because it's been surpassed in features and documentation by a similar project, José Valim's Elixir. After years of working on Reia, I've really grown to believe I'd rather spend my time working on a language which incorporates Erlang's ideas, but on the JVM with mutable state.

The Erlang cargo cult would love to hang me out to dry for even saying that... so let me address it right now.

Immutable state sucks and isn't necessary for Erlang-Style Concurrency

Immutable state languages force object creation whenever anything changes. This can be partially mitigated by persistent data structures, which are able to share bits and pieces of each other because they're immutable. This works, for example, when attempting to create a sublist that consists of the last N elements of a list. But what if you want the first N elements? You have to make a new list. What if you want elements M..N? You have to make a new list.

In mutable state languages, performance problems can often be mitigated by mutating local (i.e. non-shared) state instead of creating new objects. To give an example from the Ruby language, combining two strings with the + operator, which creates a new string from two old ones, is significantly slower than combining two strings with the concatenating >> operator, which modifies the original string. Mutating state rather than creating new objects means there's fewer objects for the garbage collector to clean up and helps keep your program in-cache on inner loops. If you've seen Cliff Click's crash course on modern hardware, you're probably familiar with the idea that latency from cache misses is quickly becoming the dominating factor in today's software performance. Too much object creation blows the cache.

Cliff Click also covered Actors, the underpinning of Erlang's concurrency model, in his Concurrency Revolution from a Hardware Perspective talk at JavaOne. One takeaway from this is that actors should provide a safe system for mutable state, because all mutable state is confined to actors which only communicate using messages. Actors should facilitate a shared-nothing system where concurrent state mutations are impossible because no two actors share state and rely on messages for all synchronization and state exchange.

The Kilim library for Java provides a fast zero-copy messaging system for Java which still enables mutable state. In Kilim, when one actor sends a message, it loses visibility of the object it sends, and it becomes the responsibility of the recipient. If both actors need a copy of the message, the sender can make a copy of an object before it's sent to the recipient. Again, Erlang doesn't provide zero-copy (except for binaries) so Kilim's worst case is actually Erlang's best case.

Erlang doesn't allow destructive assignments of variables, instead variables can only be assigned once. Single assignment is often trotted out as a panacea for the woes of mistakenly rebinding a variable then using it later expecting you had the original value. However, let me show you a real-world case that has happened to me on several occasions which wouldn't be an error in a language with destructive assignment and pattern matching (e.g. Reia).

There exists a complimentary case of mistaken variable usage to the afforementioned problem with destructive assignment. In single-assignment programs, it involves mistakenly using the same variable name twice excepting the variable to be unbound the second time:

The first pattern matching expression binds the Foo variable to something. In the second case, we've mistakenly forgot Foo was already bound. What's the result?

exception error: no match of right hand side...

We get no compiler warning in this case. This is the type of error you only encounter at runtime. It can lay undetected in your codebase, unless you're writing tests. Know what other problem writing tests solves? Mistaken destructive assignments.

Single assignment is often trotted out by the Erlang cargo cult as having something to do with Erlang's concurrency model. This couldn't be more mistaken. Reia compiled destructive assignments into Static Single Assignment (SSA) form. This form provides versioned variables in the same manner as most Erlang programmers end up doing manually. Furthermore, SSA is functional programming. While it may not jive with the general idealism of functional programming, the two forms (SSA and continuation passing style) have been formally proven identical.

The standard library is inconsistent, ugly, and riddled with legacy

Should module names in the standard library be plural, like "lists"? Or should they be singular, like "string"? Should we count from 1, as in most of the functions found in things like the lists module, or should we count from 0 like the functions found in the array module? How do I get the length of a list? Is it lists:length/1? No, it's erlang:length/1. How do I get the Nth element of the tuple? Should I look in the tuple module? Wait, there is no tuple module! Instead it's erlang:element/2. How about the length of a tuple? It's erlang:tuple_size/1. Why is the length of a list just "length" whereas the length of a tuple is "tuple_size"? Wouldn't "list_length" be more consistent, as it calls out it works on lists?

When we call erlang:now() to get the current time, it returns {1311,657039,366306}. What the hell does that mean? It's a tuple with three elements. How could time possible need three elements? A quick look at the documentation reveals that this tuple takes the form {Megaseconds, Seconds, Microseconds}. Separating out Microseconds makes sense... Erlang has no native decimal type so using a float would lose precision. But why split apart Megaseconds and Seconds?

Once upon a time Erlang didn't support integers large enough to store the combination of Megaseconds and Seconds, so they were split apart. The result is a meaningless jumble of three numbers, which you have to run through the confusingly named calendar:now_to_local_time/1 function to get a human meaningful result, which doesn't tell you what time it is now, but instead takes the tuple that erlang:now/0 returns as an argument and will spit back meaningful {Year, Month, Day} and {Hour, Minute, Second} tuples.

Legacy in the grammar

Try to use "query" as an atom in Erlang, e.g. {query, "SELECT * FROM foobar"}. What happens?

syntax error before: ','

This is because 'query' is a reserved word which was reserved for Mnemosyne queries. Never heard of Mnemosyne? That's because it's an archaic way of querying Erlang's built-in database, Mnesia, and has been replaced with Query List Comprehensions (QLC). However, it remains around for backwards compatibility.

You can't use "query" as a function name. You can't tag a tuple with "query". You can't do anything with "query" except invoke a deprecated legacy API which no one uses anymore.

Strings-as-lists suck

Erlang provides two ways of representing strings. One is as lists of integers, which is the traditional way that most of the library functions support. Another is binaries. Erlang has no way of differentiating lists of integers that represent strings from lists of integers that are actually lists of integers. If you send a list of integers in a message to another process, the entire list of integers is copied every time. On 64-bit platforms, every integer takes up 64-bits.

The obvious solution here is to use binaries instead of lists of integers. Binaries are more compact and exist in a separate heap so they aren't copied each time they're sent in a message. The Erlang ecosystem seems to be gradually transitioning towards using binaries rather than strings. However, much of the tooling and string functions are designed to work with list-based strings. To leverage these functions, you have to convert a binary to a list before working with it. This just feels like unnecessary pain.

The abstract concept of lists as strings isn't inherently flawed. In many ways it does make sense to think of strings as lists of characters. Lists as strings would probably make a lot more sense if Erlang had a native character type distinct from integers which was more compact and could avoid being copied each time a string is sent in a message like a binary. Perhaps in such a system it'd be possible to avoid transcoding strings read off the wire or completely transforming them to a different representation, which is costly, inefficient, and often times unnecessary (yes, this is a problem with Java too).

There's no "let"

Want a local binding in Erlang? Perhaps you've used let for this in a Lisp. What happens when you try to do this in Erlang? Even attempting to use "let" in Erlang just yields: syntax error before: 'let'

Once upon a time Erlang was supposed to get let bindings, and the "let" keyword was set aside for this purpose. But much like frames, it never happened. Instead, let is now an unimplemented reserved word which just breaks your programs.

There's no "nil"

In Clojure, I can write the following: (if false :youll-never-know). This implicitly returns "nil" because the condition was false. What's the equivalent Erlang?

Erlang forces you to specify a clause that always matches regardless of whether you care about the result or not. If no clause matches, you get the amazingly fun "badmatch" exception. In cases where you don't care about the result, you're still forced to add a nonsense clause which returns a void value just to prevent the runtime from raising an exception.

As you've probably guess from the references sprinkled throughout this post, I'm learning Clojure. I'm a fan of the JVM and Clojure provides a great functional language for leveraging the JVM's features. I think the sort of things that I'd be writing in Erlang I'll try writing in Clojure instead. Clojure has elegant Lisp syntax. Clojure has maps. Clojure has powerful facilities for dealing with concurrent shared state problems. Clojure has great semantics for safely managing mutable state in a concurrent environment. Clojure has real strings. Clojure has let. Clojure has nil. Clojure runs on the JVM and can leverage the considerable facilities of the HotSpot JIT and JVM garbage collectors.

I'd also like to try my hand at creating a JVM language, especially with the impeding release of Java 7 this Thursday. Java 7 brings with it InvokeDynamic, a fast way to dispatch methods in dynamic languages, and considerably eases the difficulty of implementing dynamic languages on the JVM. Stay tuned for more details on this.

52 comments:

Erricson's community-management probably isn't flawless, but I think they do a good job at integrating contributions from the community, especially since they moved to git.

I guess that, while records suck in some respects, they're good enough and nobody has cared enough to put forward a full replacement (using frames or whatever).

I mostly agree with your other comments, although to me they're not enough to give up on erlang's good points. Many of you complaints stem from the fact that erlang is both old and stable, which is a blessing and a curse. And like any tool, erlang is not good at everything.

This is something that Erlang could possibly do in cond as Robert Virding suggested it, or even in cases. For ifs, the semantics of guards are kept, and non-matching guards never do anything. It would be weird to create an exception just for this.

Moreover, I do enjoy the ability to raise an error when nothing matches. Is there any way to keep this behaviour when you need it? If it doesn't, I'm not sure I'd be ready to do the switch. For the time being, I prefer to do 'SomeCond andalso Consequence' rather than adding empty clauses to manually raise an error in all cases/ifs. Personal preference though.

Tony: Besides the VM performance, syntax and legacy details, what do you think about the Concurrency model of Clojure vs. the one in Erlang? I think Clojure syntax is cleary better than Erlang, but I'm not so sure on the Concurrency model. Linked actors seems to map the reality better (reality is concurrent, and each part/actor doesn't have the whole vision).

Do you have experience in large systems in Erlang vs Clojure ones? And in hot-code-swapping in Clojure vs Erl?

Nahuel: I think Clojure and Erlang's concurrency models target different problems, but Clojure's is more general purpose. What Erlang provides can be added at the library level. The only large system I've built in Erlang is Reia. I haven't build a large system in Clojure yet.

frk: this post has been a long time coming. That was just the straw that broke the camel's back.

> Even more exciting is that AMD's Fusion architecture, which they're implementing in conjunction with ARM, provides read and write barriers at the hardware level necessary to provide a system like Azul using commodity hardware.

Can you elaborate? The article you are linking to says nothing about this capabilities of Fusion arch.

You might consider moving on to .NET. It's recently implemented (.NET 4.0) many of Erlang's best concepts regarding concurrency and parallelism, while avoiding its (many) worst. It's also highly performant, and can be run on non-Microsoft architectures.

Here for example are the docs on the Task Parallel Library, that make concurrency almost stupidly simple to implement.

http://msdn.microsoft.com/en-us/library/dd460717.aspx

You might also take a look at the yield statement, and this broader topic on Parallel Programming in .NET:http://msdn.microsoft.com/en-us/library/dd460693.aspx

Your single-assignment complaint is more a side-effect of Erlang's binding-as-pattern-matching rather than anything that's inherent. Other languages will either complain at compile time or open a new binding with a new scope.

Your understanding of persistent data structures is naive. While singly linked lists are persistent on operations that preserve tails, there are other data structures that are persistent on the operations you mentioned -- such as the ones used by Clojure and Scala.

And the comment about String is also interesting, because if they were implemented as ropes, you'd not only have a persistent data structure, but actually gain performance for some operations (and lose for others).

Arguing about strings with the old Erlangers is akin to stabbing oneself in the eye with a butter knife. I have re-implented a string object based on binaries, complete with encoding-aware string functions. The response on the list was basically "Why would we need that? A list of integers is fine!" Luddites, really...

I haven't tried the strLength as thoroughly as the rest – I was kind of disappointed by the lack of enthusiasm and the abundance of backlash – but a quick test just now with a UTF-8 string containing 5 Chinese characters and a tab returns the correct length: 6.

yo. thanks for your post. i'm a long-time erlang fan-boy who very much appreciates doses of reality like this. food for thought, we all need to be looking out for the 'boat anchors in stockholm syndrome' we have become cozy with.

now. the problem is that erlang really does have the nine-nine mojo. i could try to go do things like it in akka or clojure, but i'd probably hit some weird bugs, no? that's the trade-off, no? a nice new fancy language with some sane syntax + the chance of hateful bugs (scala just blows my mind with the bug lists) VS. really crappy apis that i will never understand but once i do, if i do muddle though, will be super rock solid? even if slow?

What's the prolem?Use the lang which fills your needs.In one project I used Erlang to handle mio's of network events from the internet. But after collection and enrichment I propagated them to a python process to do the data wrangling.

In terms of concurrency Erlang is unbeaten. Clojure, actors, etc. running on a JVM do not scale.Why? Simple. The OS decides when a contect switch is done.As Erlang is interpreted the processes run much more concurrently.There is one rule in JVM concurrency: don't do thread blocking!In Erlang it is impossible.

But, you are right the language is ugly, somtimes verbose and old.Old and stable - simple.

Don't forget failure-tolerance. Erlang shines here.And they can proof it in production.

Today I'm going with Akka. But it as also weaknesses.As with every programming language on this dumb planet, sorry for that.

"If you're looking for a language that gets multicore concurrency right, look at how Azul implemented Java on their Vega architecture"

As you said, "latency from cache misses is quickly becoming the dominating factor in today's software performance" but the lack of value types on the JVM is a fundamental design flaw that undermines that cache miss rate. The JVM is incapable of storing an array of pairs of values of different types contiguously in memory and the incidental indirections that it introduces (i.e. boxing) destroy locality and, consequently, destroy scalability on multicores. I once calculated that Cliff Click's concurrent hash table on 100 Azul cores was running at the same speed as a serial .NET hash table on one Intel code.

So Azul have done a fantastic job of multicore Java and JVM but not multicore in general because the JVM imposes such crippling limitations in this context.

"Too much object creation blows the cache."

Only if your objects survive. Provided they die in the nursery generation, object creation is cheap and stays in-cache. If they survive, it is much more expensive and can go out-of-cache.

Historically, functional programming languages have been used a lot for metaprogramming where you tend to have lots of trees and lots of small collections (e.g. maps containing a few variable bindings) and not the large collections you see in more general programming. Purely functional data structures worked well there because the nursery generation would collect most of the garbage generated by unreachable old versions of collections. With larger collections you hit that survival problem and performance can be really bad. However, purely functional data structures are still easier for a garbage collection to traverse incrementally so they should make it easier to obtain lower pause times.

Funny thing you don't mention at all the sheer goal of Erlang, fault-tolerance, which is the main reason for the share-nothing heap. If the HTM or STM gets currupted, your 768 process will work wrong or fall like dominoes. Erlang prioritizes fault tolerance over everything else, even performance. There is a talk from one of the engineers from Azul, who saids the HTM is not a dreamland, it brings probabilistic to the update in memory, your program might believe they perform a change when the actually haven't or it changed to what other process did, and the other process believes it fails when actually succeeded. This is what Erlang/OTP & the BEAM avoids. They want to isolate the impact of disturbances, read about.

The collection of garbage can be done quickly by killing a process.

It is sad to see people who can't understad something unfamiliar to their fixated ways, they call it a ghetto.