Wednesday, July 26, 2006

It all started with an email at work. Someone was passing around a bunch of prime number printers in various languages (C, Java, C#, Perl, and Python). They all used the same (ugly) algorithm, and were supposed to show just how 'performant the languages were. Since I'm the local Ruby evangelist, I was asked to write a Ruby version. Here's what I came up with (warning, ugly algorithm ahead):

Certainly, nothing to write home about, but not too far from Perl or Python either.

Wanting to improve it, and not being able to touch the algorithm (we want to be comparing apples to quinces after all, not apples to oranges). I know my only hope is to find the bottleneck(s) and rewrite it (them?) in C. My first step is to grab Ruby's profiler and see what it says (oh, by the way, I reduced the value of num to 100 so that this would complete in my lifetime ... the profiler is slow).

Which tells me that most of my time is spent in each (well, it's actually spent in the block I sent to each. It's taking a whopping 1.39 msec per call, compared to .0X msec for everything else. What would happen if I rewrote just that block?

Enter RubyInline (A great tool written by zenspider and Eric Hodel). I'm not a C wiz by any stretch of the imagination, but this stuff is pretty easy to bang out. My new code looks like this:

What's the lesson here? Optimize what you need to (and only what you need to), profile find out what that is (it may be slow, but profiling is your friend), and use the right tools (rewriting a bit of code with RubyInline is way better than rewriting the whole app in C).

Hi Ezra,you're right ruby-prof (and ZenProfile) are great options to profile faster. I was hoping not to stick with the standard library (other than RubyInline) -- and I didn't bother to include a link to that one.

RubyInline works great for this kind of thing, but it's not for all problems. For example, as one commenter noted, you aren't really using Ruby types anymore. You're using C primitives. If you had used a different numeric type like Bignum, it would not work correctly. If you had overridden Fixnum#+ to increment your bank account on each call, that would no longer happen. RubyInline is great for embedding C code directly into your Ruby app, but that's all it really is...C code embedded in a Ruby app. Useful for some things, but not useful for others.

For what it's worth we'll probably provide some sort of RubyInline functionality in JRuby, so you can have all the joy of embedding whereever you go.

mfp,Charles sort of answered your question. In the case of the C I've written it will blow up (I took the easy way out). On the other hand, I can access Ruby objects through C and that should let me "do the right thing". I'll see what I can come up with later today.

Charles,JRubyInline will be interesting. I assume it would only be possible to inline Java. Would that be right?

Anonymous,I'll see what I can do about putting together an array example for you.

Is the "inlined" C code in the example compiled to binary and somehow linked every time the ruby script is ran? Inlining C code in this way could present a higher performance hit if the C code is larger?

charles: you can always call through to ruby methods and use VALUES. the real cost of ruby is method dispatch.

anonymous1: read the pickaxe or sift through ruby.h and intern.h.

anonymous2: It is compiled and linked the first time (and on subsequent changes to the C code). The cost of the gcc toolchain is negligible for plain C. I've never seen a case where inlining code took longer than the original.

anonymous3: who cares? the point is to get something done and make it fast enough for whatever requirements you have. what label you put on that is up to you. I label it pragmatic.

re C/Ruby, the original request was to write a Ruby version. This is a C version inside a Ruby wrapper. Interesting but maybe not a valid comparison against prime printers in other languages. That's assuming the point of the exercise was to see how it would be written in Ruby and/or for speed comparison.

re: C/RubyAnonymous, you kind of missed the point I was trying to make (which probably means I wasn't clear enough). This is a simple case (though real) where RubyInline allowed me to take a Ruby application that was too slow and speed it up by an order of magnitude.

Imagine a much larger Ruby application (thousands of lines, lots of classes, and many methods). I could do the same thing. Profile it to find the one or two methods that are just too slow, try to optimize them algorithmicly, and finally rewrite them in C with RubyInline.

At that point I might have 1500 lines of Ruby and 30 of C. I'd say that's still a Ruby application, wouldn't you?

you should mention that the c code gets even faster if you run it a second time. On my machine it ran only a third of the first time run. I guessthe reason is that rubyinline needs time to compile the code on the first run.

One problem that I'm trying to wrap my brain around with Ruby Inline is this: what if your algorithms work with ruby data structures? Is it possible to use RubyInline to access Ruby objects? How would you use RubyInline for a tree traversal type of an algorithm, for example?

So you're saying that neither you nor your coworkers knew that you only need to test up to floor(sqrt(n)) for factors?

n-1 is useless to test as a factor because the gcd of n & n-1 can only be 1. Same thing is true with n-2. It means that 2 is a factor. (and is really only true of 4, anyway.) But that will be found out by testing 2.

Since no test could disprove prime for less than 2, the maximum number that could ever need to be tested is n/2. But if n/2 is an integer, than testing for 2 finds that out. Same thing for n/3, n/4, ....

But if 4 succeeds, then 2 should have, so you never need to test 4 or 6, or 8, or 9...just the primes.

It's just a warning about naive algorithms slowing things down as well.

anonymous, I know, and they knew that you only have to test through the square root of the prime candidate.

I don't know where they found the initial set of programs, but that's the way they were written. Since that's the algorithm I was handed, it's the one I was stuck using. Writing a faster version in pure Ruby wouldn't have been a reasonable comparison.