November 01, 2005

Ruby or Java -- A (Performance) Reality Check

Update on Nov. 5, 2005: I have a followup post on the same topic. It presents the performance numbers after incorporating suggestions that came as comments.

Bewitched by all the euphoria and endorsements in the blogosphere(on Tim's Radar, News.com, Infrastructure for Web 2.0 apps, ...), I, like most programming enthusiasts, decided to go with Ruby on Rails for my current pet project. In what follows, I have tried to document some of my initial impressions -- especially the ones related to runtime performance of a few early programs.

Being new to both Ruby and Ruby on Rails, I thought of learning Ruby and Ruby and Rails as priority items. Got started with the tutorials and books available on the Web, but also ordered Agile Web Development with Rails and Programming Ruby from Amazon.com for good measure. This turned out to be a good decision as I was reaching the limits of online material when the books arrived.

After feeling comfortable with Ruby as a programming language, I decided to first write a part of my project -- a program to read web server log files, parse log entries and load them into database tables -- in Ruby using Active Records, one of the core innvations of Ruby on Rails. Lateron, to keep the program simple, I further simplified it to just print top 20 hosts, urls, referrers and User Agent strings from Combined Log Formatlogfile, sorted by frequency of occurrence.

This program consists of two Ruby source files: the main script webstat.rb takes the log filename as argument, parses each line using class LogEntry (available in file logentry.rb), and stores hosts, urls, referrers and user agent strings as keys in separate hash tables, the value being the number of times a particular entity occurs. Once the logfile is fully scanned and the hash tables are populated, the entries are sorted based on the value and then first 20 entries are displayed from each hash table.

I ran this program on a combined logfile for all accesses to www.pankaj-k.net for a specific period. Just to stress the Ruby Virtual Machine, I ensured that the file was more than 100 MB in size had more than half a million log entries. Keep in mind that I actually plan to use my final program with 10 million or more log entries. (I hope MySQL can handle that!).

On my Pentium 4, 2.93 GHz CPU, 512 MB RAM, WIndows XP box, it took 25m 47s to scan and parse the file and 1.6s to sort and display the results. You can also see the complete output. If you browse through the output, you will see that successive processing of a 4096 entries consumes more and more CPU, but only upto a limit, after which the CPU consumption drops down (reflected by decrease in processing time). This may be due to the behavior of Ruby Garbage Collector but I don't know enough about Ruby to make a good guess.

Once this was done, I wondered how will these performance numbers compare with a program written in Java. On a whim, I literally translated it to Java -- logentry.rb translated to LogEntry.java and webstat.rb to WebStat.java -- and ran the Java version against the same input file. The Java version took took 2m 3s to scan and parse the file and 0.27s to sort and display the results. Again, you can see the complete output. Notice that Java handled each chunk of 4096 entry in almost constant time.

So the Java version ran almost 12 times faster!! This is signficant. If the same ratio holds true for a Ruby on Rails web application and a Java web application then what it means is that one would need to buy 10 times more hardware to serve the same amount of load (or users). This may negate all the gains made due to faster development time with Ruby on Rails.

What about other metrics -- lines of code and memory use? The Ruby version is around 90 lines whereas the Java version is 186. The Ruby program used up around 20MB of RAM (as reported by Task manager) whereas the Java version used up more than 60MB.

Posted by pankaj at November 1, 2005 09:58 PM

Comments

Hi,

there are many solutions in the Java world that are getting close to RoR's touted development productivity. I suggest you take a look at RIFE, and our persistence engine. There are many similarities, except that we don't base it on the Active Record pattern and don't magically create classes from database schemas.

If you want the scaffolding features of RoR, we release RIFE/Crud a week ago and it received a lot of very positive responses. Many people are already successfully adopting it.

As far as scripting languages is concerned, take a look at Groovy. It has matured a lot and the performance is excellent (I have trouble distinguishing any difference with regular Java code).

About the memory usage, Java tends to cache a lot if the memory is available. When you reduce the allowed heap usage, you'll usually see that everything just continues to run without any problems.

now ruby fevered guys will present you C code called from ruby to cover this issue ;).
The problem with the web applications, i think is more related with external applications issues. Database access and more importantly server issues. Also java frameworks tends to use too much abstraction in cases.
But at the end, using ruby is not really logical for many type of applications (including web applications) when you put a bit of complex business logic (lets say graphics generation, mathematical operations, text processing, pdf generation etc) on the server side ruby's performance degrades horribly and you need to go C hacking..
On the other hand, Java's tool support is providing enough productivity witohut compromising from performance.

Interesting post.
I would suggest to post your code to a Ruby forum or mailing list for improvement. Maybe they would have some suggestions to make it faster. Taht's said, I have no doubt that Java is faster. But since you're new to Ruby, that would be fair to get some feedback. I don't Ruby myself.

I'm a Ruby developer and your code is so slow mostly because of the call to DateTime.strptime in the LogEntry contructor. This method is written in pure Ruby and quite complicated. Based on how you use the LogEntry class in webstat.rb, it is probably not necessary to parse every date on every line of your log file. Not parsing every date should make the code more than 3 times faster, based on some simple benchmarking I have done. I also suspect your regular expressions could use some tweaking.

Despite that, there is no doubt Java is faster than Ruby. In fact other scripting languages like Perl and Python are also faster. Most of this is due to the fact that Ruby does not in fact have a VM, as Mr. Kumar suggests. It actually evaluates an AST (Abstract Syntax Tree, the result of parsing) directly. There is currently work being done to create a VM for Ruby, called YARV. This will make Ruby much faster.

Still even when Ruby gets a VM Java will probably always be faster, for a few reasons:

Java is a statically typed language, which allows for many more optimizations that are harder or impossible in a dynamically typed language like Ruby. Things likes Ruby's eval() method and the fact that classes can be reopened and modified makes things harder to optimize as well.

Java has many man-years and millions of dollars invested in making it fast. Ruby does not have a huge corporation with thousands of programmers behind it. Despite that Ruby does pretty darn good, and in fact Ruby is probably faster than Java was back in the early days.

Also when it comes to web development, runtime performance of the language itself has less of an impact, as network latency and in particular, database access, are usually the bottlenecks.

Still, let's not become language zealots. I really enjoy programming in Ruby, but I still do Java in my day job, and they are both good languages. Java performs faster, but I feel Ruby is easier and more enjoyable to program in. Others may feel differently and that is fine. To each their own.