Monday, October 08, 2007

I spent a while last night reading about Tim Bray's adventures with Erlang. Tim started investigating Erlang as a part of his "Wide Finder Project" in which he's looking for programming languages that will help accelerate common tasks on the soon-to-be-very-popular CPUs with many cores but slower clock rates.

Tim works at Sun, and so this question and project makes perfect sense in light of Sun's Niagra and T2 processors with many cores and CMT. It also makes perfect sense in light of Intel's Tera-scale computing initiative where they have demonstrated chips with 80 cores. In short, the future is going to be very, very parallel, and we had better come to terms with that.

Unfortunately, the modern multi-threaded programming paradigms are ill-equipped to take advantage of these modern processors. Most popular programming languages have the same simple threads+locks paradigm that was popularized with pthreads and Java. While this works, it doesn't work well. As with so many things in programming, the threads+locks paradigm forces programmers to remember a whole lot of crufty details, otherwise they will produce code with very subtle bugs that is very, very difficult to debug. Put another way, threads+locks is the parallel programming equivalent of manual memory management. In the same way that GC manages memory in many modern programming languages, we need an equivalent to help programmers manage parallel programming.

Erlang has many features that help it work well on multi-core systems. The language is inherently multi-threaded and concurrent and relies on threading almost down to the core (you can write non-threaded Erlang programs, but the language makes it so easy to use threading that you'd hardly want to). I have talked about Erlang before a few months ago (here and here).

So, here we have Tim Bray asking a perfectly sensible question, "What programming language is going to help us programmers exploit the soon-to-be-commonplace multi-core CPU?" Erlang is certainly a potential answer to that problem. Unfortunately, Bray decided to pick a problem for which Erlang is particularly unsuited and then compares Erlang to another programming language that optimized for the problem, albeit without any parallel programming support--Bray picked a simple web log analysis problem.

The web log analysis problem is one that is well suited to Perl, Python, or Ruby. One might even say that these languages were virtually created to solve problems in this exact domain. If Perl does anything really well, it processes text with regular expressions. Python is a bit more clunky than Perl or Ruby in terms of regex syntax, but still does quite well. Ruby was created to solve many of the same problems as Perl, but with a better object model and saner syntax (IHMO).

Erlang, on the other hand, was created to develop 24x7x365, long-running telecom software. This is stuff that aims to have downtimes measured in a handful of minutes per year and that must be able to be upgraded on the fly and recover from any faults or failures. In short, Erlang aims to help programmers write code that can take a bullet to the head, recover, and keep on doing its thing, later allowing the programmer to find the fault, fix it, and upgrade the system, all while staying up and doing its thing. This is a hugely complex task, I can assure you. And Erlang conquers it in fairly good fashion.

So, when Bray decides to try Erlang on this problem, he naturally finds that it blows chunks. His first attempt doesn't even attempt to use any Erlang threading features, which of course defeats the whole reason for the investigation in the first place. Knowing enough about Erlang to be dangerous, I found myself saying, "Well, duh! Why did you expect that to get great performance?"

Overall, I'm getting pretty tired about these simplistic comparisons that people do between programming languages. It always feels like they're an elaborate sort of "gotcha." Step 1, pick a task that runs particularly well on the evaluator's most familiar language (Bray picks web log analysis, which runs quite well in Ruby, his choice language). Step 2, pick a victim language. When the victim language doesn't measure up, yell "WTF?!?! [insert language] sucks." Now, in truth, Bray didn't do that last part, but you have seen the pattern other times, I'm sure (witness the number of people that list the Computer Language Shootout as justification for almost anything).

What would have made a better comparison is writing a multi-threaded web server in both Erlang and Ruby and see which server is able to deliver the best performance to 10,000 active clients with widely varying download speeds. I'd be willing to bet that Erlang does a better job. No, I wouldn't even suggest writing a 24x7x365 telecom switch in Ruby; as fine as Ruby is, Erlang would win that hands down.

So, rather than making languages do stupid tricks as the basis of comparison, let's acknowledge that there is something that we can learn from just about every language. The fact is, all languages optimize for particular problem domains and I don't think that a universal programming language exists that would perform well on all tasks. Bray rapidly found out that Erlang isn't optimized for doing line-oriented I/O and it's regex library sucks. So what? While those problems could be eliminated from Erlang, the fact that Ericsson has deployed large telecom gear without having to fix those issues means that Erlang is ideally suited to its original programming domain.

To me the Ruby vs. Erlang exercise has very little to do with SMP parallelism and very _much_ to do with disk bandwidth and whether your file read/write library is optimized properly. In a comment on Tim's post, I pointed out that the (simple laptop) disk probably can't supply even one processor fast enough, let alone more than one.

Other commenters had some good suggestions, in particular, that "manycore" might call for different sorts of disk hardware: a massively parallel RAID array, or a small solid-state disk for each core. The HPC cluster people rely in part on parallel distributed filesystems for performance on data analysis jobs; these will need to be scaled to fit the "personal RAID array" environment.

I'm TA-ing a shared-memory parallel programming class this semester. We've gotten a lot of mileage out of OpenMP, and are exploring Cilk, which is a parallel C-based language with simple parallel semantics and provable performance guarantees. OpenMP gives you medium-grained, simple but tunable data-based parallelism. Cilk gives you easy and lightweight task-based parallelism. Combine the two and I think you'll get something both powerful and practical, that hides most of the ugly details of threads and locking from non-expert coders.

Of course I favor more "exotic" parallel languages, and I hope to see some more low-level runtimes on which those languages can be built.

I really don't think it's a 'ruby vs erlang' story that Bray is trying to tell. Rather, given the ubiquity of multi-core machines, what recent programming patterns and languages are available to solve the kind of common problems a single user has (such as the 'wide finder' problem -- he's not serving web pages; he's reading and processing a bunch of data).

One big issue with that is that its already a known fact that Erlang's IO performance is crap. Reading/writing to disk isn't the primary goal (though thats not an excuse for its suckness) so its not a primary performance goal either. I think the next big step for Erlang should be working on their IO.

HilbertAstronaut wrote: To me the Ruby vs. Erlang exercise has very little to do with SMP parallelism and very _much_ to do with disk bandwidth and whether your file read/write library is optimized properly. In a comment on Tim's post, I pointed out that the (simple laptop) disk probably can't supply even one processor fast enough, let alone more than one.

Yes, I very much agree. Erlang has bad IO and regex performance, but in the limit, the problem reduces to how fast your disk can stream the log file to your processor. I think a couple of people pointed out to Tim that any half-decent regex library (which Erlang apparently doesn't have) should be able to keep up with the disk. I find that to be true in some of the Python programs I write to do log analysis, too.

zbrown: "One big issue with that is that its already a known fact that Erlang's IO performance is crap."

Benchmarks that test the ability to handle a large number of concurrent streams suggest otherwise: Erlang's IO system isn't crap - far from it. It is simply not optimized for the task of reading a single file quickly from disk. In non-stop systems, you typically read from disk only during system restarts, which could be years apart.

It is no big secret that to get maximum sequential performance, you should block, and grab as much CPU as possible. The Erlang IO system goes to great lengths to ensure that no single thread can do this. While it would not be impossible to significantly speed up single-thread disk IO, doing so has not been a priority until now. SMP Erlang should actually make such optimizations easier.

Isaac wrote: Of course it is not "a known fact that Erlang's IO performance is crap" - Erlang seems bad at reading text line-by-line, better at writing text, fine at binary IO.

Okay, fair enough. There may be some modes where Erlang does better with file IO. Ulf explained that there is a (perfectly understandable) penalty for doing things like IO imposed by the threading.

Thinking more about it, if Erlang does fine with binary IO, the problem could be in the conversion between binary and text. Erlang uses lists to store strings, not fixed-length arrays, so I would bet that creation of a string from a buffer full of disk data is more expensive in Erlang than in other languages. Rather than a fast memcpy, you'd have to create the list structure character by character. Storing strings as lists has the advantage that all the various list operations (map, filter, list comprehensions, etc.) can work well on lists, but it does result in a penalty when trying to move string data into and out of the machine.

While I think Erlang got a lot of stuff right, this is one decision I would question. In fact, when I first learned this tidbit when investigating Erlang, it raised my eyebrows for similar reasons. Again, fortunately, most telecom applications for which Erlang was designed don't do heavy string processing and so I'm betting this wasn't seen as a big issue.

The reason perl (and python, ruby, etc.) excel at parsing log files is not just that perl is optimized for the task -- the task is also optimized for perl. By that I mean that a lot of server logs are dumped in a crappy format that has no advantages -- it is not well structured or easily human readable or compressed. The server is written this way because the person writing it assumed the logs would be haphazardly parsed someday with a language like perl or awk, so they left them in a sorry state requiring highly optimized regex and line-oriented parsing facilities do do anything useful with them. This is a vicious circle, and if you are caught in it already, then perl may sadly be the second best choice for you, right after finding another job. The proper format for log files is, of course, S-expressions -- I'll leave it to the reader to figure out the best language for parsing them.

Great post. This seems to be a problem that arises either when someone does the comparison ignorant of the systems they are comparing, or when they know perfectly well and want the results to go one way or another. Intel and AMD do it to "prove" chip speeds. I'm just a little sad to see it happening now, where language favoritism can really hinder the popular acceptance of a great solution.

Some other contributors to the "unstructured logs and regexes" world include:

1. Hammer/nail. When you have a powerful tool, you look for places to use it. PCRE is an incredibly powerful tool, and can provide a jackleg substitute for everything from parsers to macros to transformation languages.

2. Network effect. If you invent a whizzy new format for your logs, your users have to learn how to deal with it. But they already know how to deal with logs from Apache and Postfix.

3. Human-readability. When debugging a production problem, people really do "tail -f" their logs, and it's useful to have a format that's compact but full of information.

Once again we see the mythical 80-core Intel CPU mentioned as evidence that we will have really many cores on our desktops Real Soon Now.

According to an Intel guy doing a presentation I attended, this particular CPU was made as a demonstration, an experiment, and not something that is approaching the shrinkwrap end of the Intel massproduction line.

The number of cores will continue to grow of course, but we have just barely seen 8 cores in real computers and as long as the applications utilises these as poorly as they do, the business case for growing that dramatically seems rather thin to me.

The number of cores will continue to grow of course, but we have just barely seen 8 cores in real computers and as long as the applications utilises these as poorly as they do, the business case for growing that dramatically seems rather thin to me.

You're right that the 80-core system that Intel has showed is a technology demonstration. The cores are not full x86 cores, but stripped down cores.

On the subject of business case, don't confuse need with ability to deliver. In many cases, I'm just find with my < 2 GHz single core on which I'm typing this right now. But if I can get a quad core almost for the same price in 2 years, why do I care what the business case is?