Thursday, December 13, 2007

I reviewed ‘Troubleshooting Ruby Processes’ last week. Since I really liked it, I decided to take a little bit of time to talk to Philippe about it. Fortunately, he was gracious enough to answer my questions. If you’d like to grab a copy of his shortcut for yourself, you can buy one here. Philippe passed along a bunch of great hints and links that aren’t in his book, but you can read them all below.

You focus on three great tools, what three tools would you have included if you had more time/space/expertise?

Philippe DTrace would be at the top of my list. It is the only tool that can give you a dynamic and integrated view of the entire software stack: from external devices activity, to operating system dynamics, to the behavior of your Ruby code. This alone is a killer feature that gives you tremendous insight into the holistic behavior of your software stack. In addition, DTrace is a blessing to use in a production environment: it can always stay on, has zero overhead when not in use, and can enable or disable debugging probes in real time on a running kernel. In short, DTrace should be your troubleshooting tool of choice if you deploy Ruby applications on Mac OS X, Solaris or BSD. Unfortunately, due to publishing timing constraints, I was unable to include a chapter on DTrace in my short cut. I plan to publish a detailed article on DTrace early next year though, so stay tuned. In the meantime, Joyent’s examples provide a good starting point for Ruby and Ruby on Rails developers. Big kudos to the Joyent guys for all their work on Ruby DTrace instrumentation.

Another technique that I would love to document is how you can instrument the Ruby interpreter yourself with some minimal ad-hoc tracing code to target a specific problem. Inject your code at a well-chosen spot, recompile Ruby, rerun your program with your freshly instrumented interpreter… and get your questions answered! Obviously, this approach does require some understanding of Ruby internals but this is not as scary as it might seem. Eigenclass’s Self-study guide to the sources, Patrick Farley’s blog and the Ruby Hacking Guide are good resources to demystify the (presumed) complexity of Ruby internals.

There are tons of other powerful tools that can also be used to troubleshoot Ruby applications’ traditional ecosystem, most notably the network and the database. These tools are already fairly well-documented though, so if I had time to document yet one more thing, I would concentrate on visual tools built on top of DTrace such as Chime and Instruments.

If you were going to write a JRuby version of this book, what tools would it include?

Philippe I must admit that while I have an extensive Java experience I have limited exposure to JRuby at this point. Ola Bini graciously lent me some of his expertise on this topic, which I combined with some helpful tidbits below:

In particular, the JVM’s thread dump feature is a powerful introspection mechanism for investigating dead-locks and understanding your application dynamics. Since Ruby threads are also Java threads in JRuby, you can tell a lot from a JVM thread dump. When you start the JVM with one of the -XX flags you even get a Java histogram as part of the thread dump.

On the other end of the spectrum, the easiest way to investigate a problem with JRuby is often to directly change the JRuby code, either by adding logging or a Java exception at the right spot and then rerunning the faulty code. This is surprisingly easy and terribly efficient!

JRuby also includes some support for helping with dumping runtime information. Say that you want to keep track of how often a specific piece of code is reached during runtime, you can do something like this:

Everything in RuntimeInformation will be automatically dumped on exit. By combining this technique with instrumenting JRuby to provide additional logging or to raise exceptions, most of the time you get a fairly good grasp on what’s going on.

JConsole) is another standard Java tool that proves to be very useful in the context of JRuby development. You can use it to attach to a running process and then inspect memory usage, GC statistics, existing threads and so on. Besides, JConsole can also be used to trigger a lot of interesting JMX events.

Finally, when it comes to memory leaks, the usual Java tools also come to the rescue. A useful technique is to get a heap dump with jmap and then inspect it with jhat or any standard Java heap dump analyzer. Give the SAP Memory Analyzer a try—Ola had a great experience with it.

What kinds of troubleshooting/development tools do the Ruby community need to develop to ‘take the next step’?

Philippe I believe that the Ruby community would benefit a lot from having a robust thread dump tool for MRI. The thread-dump project got started on this, but the current implementation does not seem to work on any of the platforms that I have tried it on. At this point, I am determined to implement a robust thread dump tool myself, and I would gladly welcome help from the community. Contact me if you are interested in participating.

There is also a whole ecosystem of powerful tools that could be built on top of DTrace. Using DTrace probes as a foundation, you could build powerful visual tools each specializing in a particular task, e.g. profiling, performance analysis, memory leak detection, etc. These tools could potentially give you a holistic view of your system including your Ruby code but also your operating system, the network and the database. This is an area where the Ruby community can really shine as it can be done in pure Ruby, and no specific system or interpreter level knowledge is required.

For the most hardcore members of the community it might be worth investing some time and energy to help push SystemTap forward. Unfortunately, while many Ruby applications are deployed on Linux, DTrace is not available on this platform. Due to licensing and unresolved issues there is actually very little chance that DTrace will ever be ported to Linux in the foreseeable future. The closest alternative to DTrace on Linux is SystemTap, which shares the same objectives but is not quite as mature yet. In particular, SystemTap still does not seem to provide support for traces in user space programs. So SystemTap could use some love before bridging the gap between Ruby programs and the operating system.

Finally building troubleshooting tools for Ruby would be a lot easier if any Ruby developer could easily access the Ruby interpreter to instrument and extend it. This is why I consider the success of a project such as Rubinius to be the best way for take the platform to a whole new level in the long term. Any time and effort that the community invest in Rubinius will be well spent.