Hi, I'm Tony Arcieri. You may remember me from such software projects as Celluloid, Reia, and Cool.io...

Saturday, January 9, 2010

Reia: now over an order of magnitude faster at loading code

Happy New Year everyone! I really hope 2010 will be a great year for Reia, and to start it off I'd like to announce some preliminary results on the new branch which I find incredibly exciting. Very recently I've imported all the compiler passes from the old branch necessary to run parts of the test suite. The new compiler and runtime are able to execute these parts of the test suite, and the performance difference between the old and new branches are night and day.

That 14.5 seconds only includes the time it takes to execute the test runner, and not the initial environment setup and loading of the standard library. If we factor all that in, the wall time of executing a relatively minimal test suite jumps up to a whopping 30 seconds of wall time.

Yowza! I removed a number of the old tests that pertain to features which no longer exist (like the concurrent object system), but even still the difference is striking. The total wall time to execute the tests is now approximately 3.6 seconds! Note that the old branch has a relatively small standard library it loads, so comparing wall time is a bit unfair, but even still, this is a massive improvement.

Lies, damn lies, and statistics are all well and good, but to really drive the point home, I'd like to show you some graphs of the CPU usage between the two branches. First the old branch:

As you can see, the CPU usage of the previous branch when running the test suite juts out like a butte in a desert landscape. This is running on a dual core machine, and the Erlang VM is sufficiently capable of taxing both CPUs almost to their maximum just running the Reia test suite. Now let's look at the new branch:

Can you tell when the tests were running? In case you can't it's over there on the far right, the part that sticks out slightly farther than the ambient noise of all the other programs running on my computer.

I'm typically not one to care too much about performance. I often bandy about the phrase "FOR SPEED" in jest, as once upon a time I used to be a C programmer overly concerned with performance while not paying enough attention to things like development time and correctness. However, that attitude can only go so far. Sometimes you must recognize that the code you've written has some absurd performance problems due to bad design decisions, and the only solution is a ground-up rewrite.

A ground-up rewrite is not something I undertake lightly, and I recognize that in doing so I've taken a lot of the wind out of Reia's sails. However, I think this rewrite was very much justified, and in the process, I've not only improved the language's performance, but corrected a lot of the mistakes I made in the original implementation. I know Erlang much better now, have made extensive use of records which makes the codebase much easier to work with and change, and have implemented some of the features I always longed for in the original branch including the long sought after "magic rebinding" which can solve some of the usability concerns that have been expressed with Erlang.

I've also restructured all of the compiler passes to use a construct similar to Erlang's erl_syntax:mapfold_subtrees function, which I hope improves the clarity of the compiler and allows others to better understand how it functions.

In my past development of both the old and new branch, I held off on starting with TDD due to my desire for a self-hosted test suite. However, from now on I vow to use TDD whenever I add features to the language, writing tests for them first then beginning the implementation. I've already begun doing this now that the test suite is working. I will need to go back and add tests for the existing language features which I developed without TDD, but once that's done, I promise I won't add new features without writing tests for them first.

Finally, there's one further optimization that can now be performed: caching the generated bytecode when Reia sources are loaded, so that unless the original source file has changed it doesn't need to be recompiled every time it is loaded. This is similar to an optimization performed by other bytecode-based VMs for scripting languages, including Python "byte-compiling" and the compiled form output by the Rubinius VM for Ruby. By caching the compiled output, the amount of time needed to load code is further reduced, at least every time after the code is initially loaded:

Finished in 0.029055 seconds33 assertions, 0 failures, 0 errors

After the test suite is executed once and "byte compiled", it runs yet another order of magnitude faster. That means the new branch can load code somewhere between 2-3 orders of magnitude faster after it has been byte compiled. That's a serious speed improvement.

I'm interested on Reia features, especially the '**'(power) operator. I saw in the code that you justimplement this feature using math:pow erlang's operator. I implement here another way to compute this. Here are the steps:

This may sound ironic given the context of this post, but I'm not really working on additional performance optimizations at this point. The goal of the rewrite was to get the language architecture on better footing, but at this point I'm not concerned about micro-optimizing.

I just made this modification because i *work* with huge numbers and combinatorial optimization, and the ** operator does not work with my algorithms, and i'm really interested to test your language with my combinatorial algorithms :D

playing with your source code, i also add a 'times'operator in numeric.re: