Wednesday, April 25, 2007

Is reflective heap access even useful?

Charles Nutter is writing a Ruby interpreter in Java. He views Ruby's ObjectSpace feature as superfluous, because few libraries use it:

ObjectSpace is Ruby's view into the garbage-collected heap. You can use it to iterate over all objects of a particular type, attach finalizers to any object, look an object up by its object ID, and so on. In Ruby, it's a pretty low-cost heap-walker, able to dig up objects matching particular criteria for you on a whim. It sounds like it might be pretty useful, but it's used by very few libraries...and most of those uses can be implemented in other (potentially more efficient) ways.

Well of course few libraries use this feature -- it is intended for developers. Charles goes on to list a bunch of other reasons why he views ObjectSpace as undesirable; it can be dangerous when called from library code due to thread safety, etc. However, this misses the whole point. Features such as ObjectSpace are for developers to use interactively, not for libraries.

Factor has similar capabilities, and I use them regularly -- not every day, but perhaps once a week, at least. Here is just one example.

Yesterday I was debugging the compiler's interval inference code. I noticed that compiling some words, such as nth, allocated absurd amounts of memory (~30mb) in the optimizer stage. To see where the memory was going, I used the heap-stats. word to save a heap allocation breakdown snapshot before and after calling optimize on a dataflow IR graph which triggered the problem. Here is the allocation breakdown before optimization:

The only major difference is in the space used by bignums; over 35mb worth! To get a better handle on what was going on, I obtained a list of all bignum instances from the VM, asked the VM for their size and printed out the resulting array:

[ bignum? ] instances [ size ] map pprint

All bignums were 16 or 24 bytes, except one which was 35mb! I had a hunch that this bignum was manifesting as a result of shift being called with a very high left shift count. Also, since the problem only occurred with interval inference enabled, I guessed that the interval-shift word was the problem. Setting a watch on this word with \ interval-shift confirmed my suspicions; a "data heap growing to ... bytes" message was printed right after interval-shift was called. Some more playing around in the listener revealed that the compiler was inferring a range for a value which had very large bounds, then this value was used as the shift count passed to shift, so the resulting range of values was exceptionally huge.

The fix is something I should have thought of in hindsight; if shift is applied to a value with a large range, the compiler should not attempt to compute the range of the result. Compiling something like 10 mod 100000000 * shift should not exhaust memory.

Now, there are two ways I could have arrived at this conclusion, and been able to fix the bug:

Do what I did above: using features such as heap introspection, word watches, and the listener to track down the problem in a very short amount of time

Banging my head against the wall, carefully reviewing every line of code I changed in the last few hours, and doing a "binary search" (adding/removing changed lines until I pinpoint the problem)

Because Factor has nice development tools, I was able to go for the first option. Of course even the second isn't that painful in Factor (there's no recompile/restart needed, even for most compiler changes).

I suspect the real reason Charles wants to see this feature removed from Ruby is that the JVM does not support it. Similar reasoning was used to propose that Ruby's upcoming Unicode support be crippled by using UTF16 to represent code points, because this is how Java's designers chose to do it in 1995. (The list goes on: some people don't like Ruby's continuations and think they should be removed, because, again, they're hard to get right in Java. Is anybody else bothered by this type of thinking?)

So, what's the point of this post? I just wanted to say, I'm glad I don't have to cripple my language because some guys at Sun made some bad decisions while I was still in elementary school.

6 comments:

You are entirely incorrect in your assumptions about why I'd like ObjectSpace to go away.

First off, if it's intended for developers, why is it available all the time? I would have no issues with ObjectSpace being disabled by default and available through a runtime switch or external tool. Like I said in my post, we implement it, but it adds overhead. Overhead for all programs executing, whether they're in development mode or production mode. Bad news. A feature for use by developers shouldn't cripple a production server.

Second, Java does have this capability...but it's something you access with separate tools and separate switches that turn on tracking of this information. Java has an extremely wide range of options for inspecting the heap (live or offline) tracking object creation and collection, and just about everything you could possibly want. But you have to ask for it or hook up external tools...tools which are easy to use and shipped with Java.

The truth is that Ruby's ObjectSpace, whether intended for "developers" or not, is used to implement some rather nasty hacks in the few places where it's used...hacks that could be better implemented in other ways. It limits the evolution of Ruby's memory management and threading subsystems, so much so that Ruby 1.9 won't alter the in-memory model or garbage collector at all, and forces all threads to use a single giant lock. And because those nasty hacks are used for purposes outside of development, other implementations are forced to cripple themselves in similar ways.

I recognize the utility of ObjectSpace and heap inspection, but I'm not willing to accept the limitations imposed at runtime or on future development of Ruby in order to support them. Perhaps you should keep your ill-informed suspicions to yourself, or do a bit more research before you start putting words in others people's mouths.

Drawing a distinction between 'development' and 'production' runs of an application goes against the philosophy of languages such as Ruby, Lisp, Smalltalk, etc. We don't like recompiling and restarting our applications just to test a minor change or be able to single-step through a function or to inspect some object.

Yes, I've seen Java's remote debugging API, it is a pain to use and it is entirely unsuitable for writing quick one-off scripts to inspect a running application's heap. Also as far as I can tell, Java profilers don't have good support for attaching to running applications -- again, because of the development/production split.

Good developer-oriented introspection tools and good performance are not mutually exclusive. And remember that CRuby has a lot of legacy and a lot of hacks; I doubt ObjectSpace figures in the top 10 concerns when it comes to making the CRuby VM thread-safe.

Both of you missed one truly painful problem with your idea of using external tools or setting of runtime switches to allow introspection of the heap -

It has the effect of altering the runtime characteristics of the erroneous program so that the particular error being investigated may be altered and hence may in fact disappear.

Reasoning: Historical attempts at doing this over many years in both mainframe and minicomputer environments has shown that (for myself and colleagues of mine) there are some errors that will be altered by changing the runtime characteristics of the program. This arises due to compiler and/or external runtime mmonitoring tools altering the program in question.

If the introspection tools are always incorporated into the runtime of the program, the base system will (in most situations) not change and the error should be repeatable and findable.

As Charles (great name, no?) says, Java's got great heap introspection tools. And get informed: you can even query the heap with SQL (or something rather like it). I would suggest this makes one-offs quite ridiculously easy to write, assuming someone ever gets around to writing a commandline wrapper around it. As it is the boilerplate needed for it isn't that much, so it shouldn't be difficult.

As for the third comment, yes heap introspection tools do indeed change the characteristics of the runtime, but generally speaking, resource allocation problems tend not to be Heisenbugs, so they should still manifest when the VM is being instrumented. If they don't, that's a corner case most developers are probably willing to live with.

At the risk of throwing some gasoline on the flames, I thought Factor had gotten to standing on its own merits, and not that the fact that it isn't Java (or Ruby).

It is the "corner case" that gets you in the end and ends up causing so much time and effort to be wasted in finding a solution. The general run-of-the-mill errors are a piece-of-cake in comparison.

Personally, I think Factor stands on its own merits. Any comparison with other languages is a side issue.

My own opinion of Java is that it is a modern poorly implemented version of COBOL (as is C++), more verbose and harder to use. I have written networking programs in COBOL in DNA (OSI 7 Layer Model) environment. But that is my opinion and YMMV - so be it.