Introducing ImageVoodoo

I just pushed out the first release of ImageVoodoo, a nifty little image manipulation library conceived as a quick hack by Tom. It’s a play-on-words of ImageScience, of course, the quick, lightweight imaging library for Ruby. To get it,

jruby -S gem install image_voodoo

What’s cool about ImageVoodoo (other than the name) is that we were able to make it API-compatible with ImageScience. In fact, ImageVoodoo’s image_science.rb simply looks like this:

So, you can use it pretty much anywhere you might use ImageScience, and it should just work. At work, we’re using it with attachment_fu, and it works great. ImageVoodoo even steals and uses ImageScience’s unit tests (which all run successfully, too). Speed-wise, it’s about twice as slow as ImageScience running on MatzRuby, but still plenty fast enough for most cases.

But we wouldn’t be having fun unless we embraced and extended a little bit, right? So I added a couple of extra features you might find useful.

Preview

Since ImageVoodoo is just leveraging the Java Platform’s imaging libraries, image rendering can be easily tied into a simple preview frame. This code:

Command-line

As I was fixing a bug in ImageVoodoo’s file saving I whipped up a little command-line utility to aid debugging. It allows you to string along several image manipulation actions on a single command-line. For example,

This will resize image.jpg into three smaller images, t[1-3].jpg, but will pop up a preview frame at each step of the way. Simply close the preview frame to continue to the next action, or quit out of the application to abort.

Summary

And so, another functional area, image manipulation, becomes as easy on JRuby as it is on MatzRuby. Now that fancy social networking application you’ve been working on should have one less reason to be able to run unmodified on JRuby!

I don’t disagree with Avdi’s main point, which is that monkey-patching isn’t always the best tool for the job. But I contend that it’s still a basic part of the Ruby programming culture, like it or not. I believe the reason for this is that monkey-patching is an extremely empowering technique. It’s part and parcel of the hacker/DIY culture.

Anyone who has had an experience in a less-flexible language or caught in a hard place trying to debug a closed-source library can appreciate how liberating it is to take matters into your own hands, fix your own problem, and to be able to do it without modifying the original library source. It’s a revelatory experience that makes you never want to go back to inflexible programming environments ever again. There’s also nothing wrong with monkey-patching a library you don’t control as long as you freeze all the code you’re using before making a working patch -- if your patch works and you don’t change or upgrade anything, you’re not likely to encounter any problems.

Of course all monkey-patches are not created equal. Some are certainly more onerous than others. Let me give you an example I ran into myself recently, which raises some questions for which I don’t yet have the answer myself.

The most prolific Rick Olson’s popular plugin attachment_fu uses Ruby’s tempfile library. It has a legitimate need to extend Tempfile to preserve the file extension on tempfiles, so that image conversion routines can use the file extension to help identify the format. How it accomplishes this, however, is not very pretty. Here’s the original Ruby code (as of Ruby 1.8.6 p111):

Seems like about as reasonable a place as any to hook in, right? But the method is marked private -- I’m guessing the original implementor (according to svn blame, it appears to be Akira Tanaka) probably did not intend for the method to be replaced. But, hey, it’s Ruby, right? So away attachment_fu goes:

Tempfile.class_evaldo# overwrite so tempfiles use the extension of the basename. important for rmagick and image sciencedef make_tmpname(basename,n)ext=nilsprintf("%s%d-%d%s",basename.to_s.gsub(/\.\w+$/){|s|ext=s;''},$$,n,ext)endend

As far as monkey-patches go, this isn’t too bad. There is no code copied from the original implementation, and it’s a fairly focused and compact method. The fact that it’s private is a bit of a smell, though. But, it works, and we forget about it happily (if we even knew it existed in the first place), and hope that tempfile.rb never changes.

Well, in my case, it did. MenTaLguY has been working on more robust, thread-safe versions of the standard libraries. And he changed the arity of make_tmpname:

So, it took me a while before it occurred to me that something in my project might be overriding make_tmpname. But even after I found it, notified MenTaLguY, and he fixed it, it still left me wondering: who’s in the wrong here? Akira-san, for not making a better way to hook into make_tmpname, Rick for monkey-patching it, or MenTaLguY for changing the method arity in his rewrite? I can’t really point the blame at any of them.

There are certainly more egregious and offensive monkey-patches than this example (and I include myself in that camp). In any case though, I could live with just about any monkey-patch if I had better debugging tools. For example, it would be great if you could ask Ruby to track and retain references to all methods, including those that get replaced, along with the source locations where each was defined. Another possibility might be a before_method_added hook that could let you track method replacements as they’re about to happen (and maybe even veto method redefinitions!).

These are the types of tools that an enterprising Ruby programmer could look at adding to any one of the existing Ruby implementations. Both JRuby and Rubinius are becoming test beds for bleeding edge features that could help advance the state of the art. So pitch in and help make monkey-patching less painful!

Postscript: This post coming to you from the Illinois interstate courtesy of Curt and his 3G EVDO wi-fi hub!

I’m pleased with how it turned out considering I hadn’t done this sort of thing before. Special thanks to Cindy Church for putting it all together, including all the production: setup, recording, editing, even the music!

A QuickTime movie version is available as well. Check it out and let me know what you think.

Next up in our performance series: Builder::XChar. (Another fine Sam Ruby production!) While this piece of code in the Builder library strikes me as perfectly fine, it also tends to slow down quite a bit with larger documents or chunks of text.

(Hmm, I must have accidentally swapped in some large program in the middle of that JRuby run. The perils of benchmarking on a desktop machine. I don’t claim that the numbers are scientific, just illustrative!)

Fortunately, the fix again is very simple, and has previouslybeen acknowledged. The latest (unreleased?) Hpricot has a new native extension, fast_xs, which is an almost drop-in replacement for the pure-ruby String#to_xs. (Almost, because it creates the method String#fast_xs instead of String#to_xs. ActiveSupport 2.0.2 and later take care of aliasing it for you). Unbeknownst to me, I ported fast_xs recently as part of upgrading JRuby extensions that have Java code in them. And so it happens to come in handy at this time. The patch for that is here.

I have the latest Hpricot gems on my server, so you can install it yourself (for either Ruby or JRuby):

We’ve been here before. So here’s the scenario: You’re feeding medium-to-large chunks of XML out of one Rails app, to be consumed by another via ActiveResource. Maybe those chunks have embedded HTML, or maybe they’re an Atom feed containing several pieces of HTML with all the entities escaped. Maybe they contain entire Wikipedia pages in them. Lots of entities that need expansion when the file is parsed.

So what does ActiveResource do with this? Hash.from_xml. Which uses xml-simple. Which constructs a REXML::Document, and proceeds to navigate the entire DOM, scraping the text nodes out of it so they can be stuffed in a hash to be handed back to ActiveResource. And how does REXML expand all the entities it runs across? With this little lovely:

All this on a paltry 393K xml file. Makes me wonder how anyone ever does any serious XML processing in Ruby.

I know, this is open source, I should be whipping up a patch for this and submitting it. Well, I did cook up a solution, but it unfortunately is only available for JRuby at the moment. (I also have much more faith in Sam Ruby than myself to get the semantics of a rewritten REXML::Text::unnormalize correct.)

A while back I cooked up JREXML because Regexp processing in JRuby was slow at the time, and the guts of REXML is driven by a Regexp-based parser. JREXML swaps out that regexp parser with a Java pull parser library, and at the time it provided a modest speedup.

So, in the context of JREXML, the solution now becomes simple, by taking advantage of the fact that Java XML parsers typically expand entities for you. The just-released JREXML 0.5.3 circumvents REXML::Text::unnormalize when constructing a document from the Java-based parser. And the results again don’t disappoint:

Update: At Sam’s request, I ran the numbers again with REXML trunk, which condenses entity expansion into a single gsub. Speed is more in line for MRI, but didn’t move much for JRuby (probably more a datapoint for JRuby developers than REXML developers).

People have been asking for a while how fast JRuby runs Rails. (Of course, “fast” has always been a relative term.) We haven’t been quick to answer the question, because frankly we didn’t know. We hadn’t been building real Rails applications on JRuby ourselves yet, and there was no definitive word from the crowd either.

In the project I’m working on, we’ve committed to using and deploying on JRuby. Eventually we were going to reach the point where we’d need to find out how well our application runs. So today I began running a simple single request benchmark on a relatively busy page. The numbers turned out to be rather surprising:

Now, MRI (C Ruby) will always run about the same speed no matter how many runs you give it, but it’s well known that the JVM needs time to warm up. And indeed it does; after 250 iterations, Mongrel running on JRuby finally surpasses MRI. The JRuby/Goldspike/Glassfish combo comes close as well.

Some details about the setup:

I ran the tests on my MacBook Pro Core 2 Duo 2.4 GHz. I didn’t disable one of the cores for the tests, which means that JRuby has an advantage over MRI because it can use both (native threads at work). However, the test script ran the requests serially, which means that the advantage was minimal.

The application is indeed of the “hydra” variety; the setup is nearly identical to the second diagram on that page. So a single request is passing through not one, but two Rails applications in addition to touching the database. It rendered an HTML ERb view with data from an ActiveResource-accessed RESTful service. The applications are based on Rails 1.2.3.

JRuby on Glassfish used Glassfish 2 and Goldspike 1.4, deployed in war files via Warbler.

The two JRuby setups used JDK 1.5 and were tweaked to disable ObjectSpace and use the “server” VM (-server argument to the JVM).

The main point I wish to make with these numbers is that JRuby performance is there today, and still has room to grow. There’s no longer any doubt in my mind. Yes, this is a simplistic application benchmark run on a developer’s machine, but it’s a real application. The test may not be exacting in precision, but I see enough in the numbers to believe that this will be replicable to production environments. The plot thickens!

So, it’s well known that Ruby owes a debt to its predecessor Perl, although some (maybe many) question whether we should repay that debt or even go so far as to put Perl on trial and excise those elements which somehow haphazardly survived the generation gap. It turns out the evidence is mixed.

Update: I use the word “obscure” in the title because, in my experience, they are obscure. “Ugly” is pure opinion, but this is my blog, after all.

Exhibit A: BEGIN/END

Update: Yes, yes, this is an awk-ism, not a perlism, strictly speaking. And I don’t deny its usefulness for pure scripting tasks. I just don’t see its utility in a larger application.

Why would any sane Ruby programmer do this? Have you ever seen a use for BEGIN that isn’t met by simply executing code at the top level of the main program? Geez, BEGIN even has its own node in the AST!

And how about END? If you really need to hook into interpreter shutdown, just use Kernel#at_exit. (In fact, Rubinius currently uses END simply as an alias for at_exit.)

Exhibit B: <> (ARGF)

Thank goodness we didn’t get the diamond operator in Ruby, but we did get ARGF as a replacement. Though obscure, it actually turns out to be useful. Consider this program, which prepends copyright headers in-place (thanks to another perlism, -i) to every file mentioned on the command-line. Any other creative uses of ARGF out there?

Exhibit C: The Flip-flop

This is a weird beast. I didn’t even know of its existence until Charlie was complaining about having to compile it properly. Apparently we have Perl to thank for this nonsense as well (and, indirectly, sed). With the exception of the sed-ism, I’m not convinced it adds any value -- in fact the code usually ends up looking more verbose.

This program, when run with itself as an argument, prints out everything between BEGIN and END.

Perl at least is sane enough to return true or false for its own defined operator. But method? Looking at the source, I see also expression, local-variable(in-block), assignment, class variable, true, false, and self. But why would this output be useful? As if it isn’t already plainly obvious what is defined?.

Many libraries and plugins ship custom Rake tasks. Of course, as slick as Rake is for a build and configuration language, it’s still just Ruby code right?

Case in point: I released a version of ci_reporter with a fairly careless bug in a rake task that attempted to << a string into an existing environment variable. It escaped me at the time that Ruby sets up the ENV hash with frozen strings, because my own usage of ci_reporter did not exercise the task in that way.

So shouldn’t that Ruby code be subjected to the rigor of automated testing just like the rest of your code? It became obvious to me that it must be so. It turns out it’s straightforward to use Rake in an embedded fashion, and invoke targeted tasks in your custom Rake recipes. The examples here use RSpec, since that’s what I use for testing ci_reporter, but you could apply this to Test::Unit as well.

The technique is to create a new instance of Rake::Application, make it the active application, and load your rake scripts into it:

Notice the use of #load rather than #require, as you want to execute your rake script each time you setup the Rake application object. When tearing down your test or example, you should cleanup Rake by setting the Rake.application back to nil (or save the previous application and restore it, if you prefer).

Now, in the body of your test or example, you invoke your rake task with @rake['target'].invoke. Here, I’m exercising the case of an existing, frozen ENV value. After the task is invoked, I check the value after the task to make sure the variable was modified as expected.

it"should append to ENV['TESTOPTS'] if it already contains a value"doENV["TESTOPTS"]="somevalue".freeze@rake["ci:setup:testunit"].invokeENV["TESTOPTS"].should=~/somevalue.*test_unit_loader/end

I was fortunate here that the tasks for which I wrote tests after the fact were simple enough to be testable on their own, which may not always be the case, especially with organic, homegrown Rake tasks that interact with the world outside of Ruby. Still, if your Rake tasks are a critical part of your application, library or plugin, they should be tested. For example, it would be nice if tests could be written for the Rake scripts in Rails’ Railties module to increase coverage there.

Perhaps someone out there will run with this idea and take up the challenge and write a Rakefile completely in a test-driven or behaviour-driven style. It’s always been a sore point for me with Make, Ant, Maven, and virtually every other build tool in existence that you have no other way of automatically verifying your build script is doing what you intended without manually running it and inspecting its output -- it just feels so dirty! I’d expect that test-driven Rake scripts would likely have the level of granularity to match the tasks that need to be done, in a way that you can combine them in the right ways to make incremental and deconstructed builds simpler.

I’d just like to take a moment to echo what Ola has to say about the JRuby 1.0 release. This one is definitely for all of you out there. It’s been incredibly gratifying to see the growth of the community, and the increased amount of positive feedback and success stories with JRuby, and I’m honored to have been part of the team that made 1.0 happen.

We really feel strongly that we’ve put out a quality piece of software, a tool that will make your work more enjoyable, easier, and allow you to inject some creativity and innovation back into the Java stack.

We’ve got a solid base to start from. Being able to run Rails is no small feat, to be sure, but the best is yet to come. You can expect more performance, a complete compiler, support for more applications, and tighter integration with long-standing Java technologies. In addition, we’d like to push the envelope of what both Ruby and Java are capable of, including implementing (even driving) Ruby 2.0 features, leading the way for dynamic language support in the JVM, eased as well as novel ways of doing application deployment, better debugging and tooling, and experiments with new ways of doing concurrent and parallel computing.

Do join up with us -- it’s never too late to hop in and enjoy the fun!