When it comes to documentation, there’s always room for improvement. I’ve been contributing to docrails for quite a while now. However, one issue that has always annoyed me is how slow RDoc is.

In particular, generating RDoc output for ActiveRecord is painfully slow. It takes 34.2 seconds on my machine. Using JRuby helps a little, seeing that it can do JIT and all, but not much: it still takes 29 seconds (including JVM startup time). I had hoped for a bigger performance improvement.

So a few days ago I started optimizing RDoc. And not without result. Here are the results of generating RDoc for the Ruby 1.8 standard library with RDoc running in MRI 1.8:Before optimizing: 1624.8 seconds (27 minutes). I had to close all applications because RDoc ate almost 1.5 GB of memory.After optimizing: 531.2 seconds (8 minutes and 51 seconds). RDoc now uses only about half the memory it used to need.Performance improvement: 206%!

Multi-threaded parsing

My laptop has a dual core CPU. It’s kind of a waste to see RDoc utilizing only a single core. RDoc can be parallelized by running 2 rdoc processes at the same time. But I’m never generating 2 different sets of RDoc documentation at the same time, so running 2 rdoc processes in parallel won’t do me any good. So I had to search for ways to parallelize a single rdoc process.

It turns out that RDoc’s code parsing phase is pretty easy to parallelize. It just creates a new parser object for every input file, and parses files sequentially. So I modified RDoc to create multiple worker threads during the parsing phase. The main thread will offer a list of filenames, and the worker threads will each consume filenames and parse the corresponding file as fast as they can.

What I did not try to do however, is making the parser itself multi-threaded. That’s just asking for problems. Those of you who are interested in multi-threaded programming, but aren’t experienced with it, should keep this in mind: try to keep your threading code as simple as possible, and make sure that your threads share as little data as possible. In my case, the only things that the threads share are the filename queue and the result aggregation array.

MRI 1.8 implements userspace threads, so having multiple threads doesn’t increase CPU core utilization. MRI 1.9 uses kernel threads, but it has a global interpreter lock, so it can’t utilize multiple CPU cores either. Luckily, JRuby doesn’t have a global interpreter lock. The parsing phase is now about 35% faster, when run on JRuby on my dual core machine.

Cache template pages

For larger projects, the slowest phase of RDoc is probably the HTML generation phase. The latest version of RDoc uses ERB templates for HTML output. RDoc generates an HTML file for every class, every module and every source file. However after some profiling with ruby-prof, I found out that RDoc recompiles the ERB template for every output file, which is slow.

I modified RDoc so that it caches compiled ERB templates. This made it about 19% faster.

Reduce the number of relative URL generations

For every output file, RDoc generates a list of URLs to this output file, relative to the output filenames of all classes, files and methods. This is done by calling the function index_to_links, which runs in O(n) time. For every output file, index_to_links is called 3 times, each time with the full list of classes, files and methods. Suppose there are K output files, L classes, N files and M methods. Then this will result in K * (L+N+M) operations, which is quadratic! Ouch!

The RDoc authors already tried to optimize this by only calling index_to_links when the current output filename’s directory is different from the last one. For example, if the last output file was classes/active_support/cache/memcache_store.html and the current output file is classes/active_support/cache/memory_store.html, then index_to_links doesn’t need to be called again, and its last result can be reused. The problem is that the output filename list isn’t sorted, and so a directory change is detected much more often that it needs to be.

I optimized this by sorting the list. This resulted in an 8% performance improvement.

Reduce garbage and redundant operations in index_to_links

But the story doesn’t end there. index_to_links accepts two arguments: a filename and an array. index_to_links sorts its input array every time it is called. Remember that I said that index_to_links is O(n)? It’s not: it’s actually O(n log(n)) because of the sorting.

The arrays that are passed to index_to_links remains the same every time. So it’s very inefficient to keep sorting them over and over. I optimized this by:

sorting the arrays only once.

passing the sorted arrays to index_to_links.

modifying index_to_links to not perform any sorting.

modifying index_to_links to use in-place array manipulation methods as much as possible, to avoid generating unnecessary garbage objects.

Failed: parser output caching

Seeing that the parser isn’t very fast, I thought about caching its output. So I modified RDoc in the following manner:

It Marshal.dumps the parser output to a cache file.

It consults the cache file whenever possible, instead of parsing the input file.

This made the parsing phase about 20% slower on a cold run, but 75% faster on a warm run.

Unfortunately, this approach totally fails on large projects. When run on the Ruby 1.8 standard library, it results in cache files that are 6 MB or larger. Loading such a file is much slower than parsing the original source file, and uses more memory too. My laptop crashed during this experiment because Ruby used 2 GB of memory. So I abandoned this effort.

Final words

Take the performance improvement numbers that I’ve given with a grain of salt. These numbers were taken by running RDoc on its own sources, under MRI 1.8. The performance gain really depends on the input. You’ve already seen the difference in performance improvement between running RDoc on the Ruby 1.8 standard library, and running it on ActiveRecord.

Latin America Rails Summit has been a very nice experience. It has been about a week since I’m back home, and the jetlag that I was experiencing is finally disappearing. I’ve talked to many interesting people, such as Chris Wanstrath from Github, Obie Fernandez from Hashrocket and Chad Fowler.

I heard a… uhm… “interesting” story from Chris. There was a time when he was reading my blog. His boss saw my blog’s banner and thought that he’s viewing non-safe-for-work-content. I heard similar things from several other people. Or, to quote Chad:

“When I met you at RailsConf, I was pleasantly surprised that your personality doesn’t match your pr0n website.” — Chad Fowler, referring to this blog

Or, as Tinco said, “the entire Ruby community wants to read your blog, but they can’t”.

I asked, why not use Adblock? “Too much of a hassle” was the answer I got.

I also pointed out that this is just my personal blog, and that it is clearly separated from the company blog. But the response was “But there’s too much interesting stuff on your blog!”

No more excuses!

Starting from today, there is no excuse anymore! I’ve added a “Censor banner” button, as you can see in this screenshot:

The state will be saved into a cookie, so even if you press Reload, the banner will stay hidden. Therefore, this blog is now officially Safe For Work(tm).

Update: it is now also possible to hide the banner by appending “?hide_banner=yes” to the URL, e.g.:

What’s with your blog’s banner anyway?

The girls you see in the banner are actually friends of mine. Yes, I know them in real life. Those who live in Japan or who watch anime are probably familiar with the term “cosplay”. These girls like cosplaying, and I like this particular photo that I shot of them, so I turned it into a blog header. With permission.

“modrails.com doesn’t eat its own dog food. They’re using nginx/0.6.32, so does that mean they think Phusion Passenger sucks?”

That’s right, modrails.com is behind Nginx. But that’s because the entire website only consists of static HTML! We didn’t use PHP, or Rails, or anything dynamic. The website is generated from a few input files with Webgen and rsync’ed regularly.