Appfolio Engineering

There are a lot of great profilers for code that runs in Ruby. There are... fewer really good profilers to work on a standard generated-by-GCC binary, like the Ruby interpreter itself. GPerfTools and Instruments are really your choices, with an honorable mention for DTrace - it isn't a profiler, but it's so flexible that it can do a lot of profiler-like things, including forRuby.

Today, let's talk about getting GPerfTools working with Ruby - specifically, with "vanilla" Ruby, often called Matz's Ruby Interpreter, or "MRI."

You'll need GPerfTools, of course. Assuming you use Homebrew, you can install a few useful dependencies like this:

brew install gperftools
brew install graphviz # you'll see why later

You'll also need to build Ruby with the GPerfTools profiling library, and with Address Space Layout Randomization turned off (Mac OS calls its ASLR implementation Position Independent Executables, or PIE.) And it wouldn't be bad to tell it to always use a frame pointer -- marginally slower, but easier to debug at times. That probably looks like this:

What Next?

How do we use it? And what if something went wrong?

First, let's try to generate some profiling data. Profilers are quite picky about knowing what program generated the data, so you're much better off running the Ruby interpreter directly. That's a problem if you have a local built Ruby, not installed, and you'd otherwise run it with a script like runruby.

With trial and error, you can figure out which environment variables to set to run Ruby, from the Ruby dir, with GPerfTools active, running optcarrot. You cloned Ruby and optcarrot into two directories next to each other, so optcarrot is at "../optcarrot" for me. Here's the command I use when I run it:

Note the "CPUPROFILE" variable toward the end - that turns on GPerfTools profiling and tells it what file to write to. In this case, I chose "/tmp/prof.out".

Troubleshooting: if Ruby can't find its libraries, you'll need to change PATH and RUBYLIB. If you're not sure what to change them to, change your runruby script near the end, line 105 for me, and "STDERR.puts ENV.inspect" to see what your environment variables should look like. You can run optcarrot with runruby to see what it uses ("./tool/runruby.rb ../optcarrot/bin/optcarrot --benchmark ../optcarrot/examples/Lan_Master.nes") Then, put in your variables instead of mine from the ones you printed to the console.

If you run optcarrot and a /tmp/prof.out file appears... You win! Now you have profiling data. You can move on to "Seeing the Results" below.

More troubleshooting: if it doesn't work, you may want to clean your Ruby directory and rebuild:

The "git clean" line means "destroy any files that git didn't put here." It cleans out everything a "make clean" would, but also all caches, logs, configurations... Even the configure script itself. It's the equivalent of blowing away your repo and re-cloning, but with less network traffic.

But it blows away the configure script, so you have to run autoconf to regenerate it, and then you'll get a very slow full rebuild. So only do this if you think you need it.

Seeing the Results

So, that's cool. It's kind of neat seeing the file appear in /tmp/prof.out. But what can we do with it?

We'll use pprof, part of GPerfTools. Fair warning: it's a little rough around the edges. If you use it much, you may find yourself reading its Perl source regularly. Pprof isn't amazing for weird corner cases like "you made a typo in the filename for the profiler output" and other "unlikely" situations. On the plus side, it's simple enough that with just a little Perl know-how, or Ruby know-how and some squinting, you can usually figure out what it's doing.

To run pprof, give it your binary's location, the profiling file, and what kind of output you want. Here's a simple example:

pprof ./ruby /tmp/prof.out --text

That should print out examples of what functions are taking the most time, like this:

The first column is how many samples landed in that function but not others that it called. The second column is what percentage of CPU time was spent on that function but not any other functions it called. The third column is a running sum - the total percentage of CPU time that's accounted for by this function and all the ones above it in this sort order (see how it always goes upward on every line?) The fourth and fifth columns are cumulative -- that is, the samples and percent of CPU time used by this function, plus everything else that it called.

The fifth-column percentages don't even vaguely add to 100% - they're cumulative, so they include that function and everything under it. Since vm_exec_core does a lot of calling rby_ary_push, for instance, they wind up with significantly more than 100% of total CPU time between them, because the shared time is double-counted.

(And obviously, last on the line comes the function that's being called, or a big hex number like "0x00007fff9b5eb273" if pprof can't figure out what was getting called there. You won't usually see static functions here, but that's because they're usually inlined -- their addresses are similar to normal function addresses and don't start with 0x000007fff. Addresses starting with that prefix seem to be on the stack. Perhaps the EIP register or ESP register are being used for something else briefly, which seems like it would confuse GPerfTools method of sampling.)

Think it might be a problem with debug symbols in your Ruby binary? Pprof just uses "nm" for that, so run "nm ./ruby". If you see a bunch of function names, your Ruby binary is fine. Here's what that looks like:

More Fun Output Formats

Pprof can output graphical call graphs. They're fun, but I personally find them a bit impractical. But if you'd like to see and you installed Graphviz via homebrew above... Go ahead and run the same thing, but with "--web" instead of "--text".

pprof ./ruby /tmp/prof.out --web

This should create a nice svg file, also in /tmp, and open Safari to view it (see below.) If instead you got a warning about not having "dot" installed, that means Graphviz wasn't installed, or isn't in your path.

In fact, go ahead and run pprof --help and look at all the output types. I'll wait.

In general, pprof can output line numbers for call sites... But mostly, it doesn't do that on Mac, which lacks the addr2line command-line utility. If I can get gaddr2line working, perhaps you'll get another blog post on it. So: if you get question marks instead of line numbers from the Ruby source code, that's expected.

Play around with pprof - this is all poorly documented, but the trial and error is pretty good.

What Does All This Tell Me?

This should nicely break down where Ruby is spending its time. It's a sampling profiler, so the accuracy is pretty good. If you remember profiling optcarrot before, the output above should make sense -- optcarrot spent its time calling array appends, and Ruby is spending its time on rb_ary_push.

From here on out, you should be able to use this and figure out what Ruby's doing.