Thursday, July 2, 2009

Java vs. Scala vs. Groovy - Performance Update

It's been quite a while since my post on the micro benchmark comparison between Java, Scala and Groovy. Since the Groovy developers tackled some performance issues with the Groovy 1.6 release, I thought, I'd give it another try. I used Groovy 1.6.3, Scala 2.7.5 and JDK 1.6.0 Update 14. I left the code that I used for the last comparison unchanged. I did not make much use of any specific language features, so it's quite a straightforward quicksort implementation, see the code below.

The ResultBefore showing the result, just let me make clear, that I love the Groovy ;-) But seriously, after the anouncement, that the performance was one of the primary goals for the Groovy 1.6 release, I was a bit disappointed. We'll have to keep in mind though, that this is just a (poorly written) micro benchmark, that doesn't mean anything. And after all, it's all about the groovy language features anyway.

Mr Jan, I can see this reaction every time speed of groovy comes into picture...

Going further this route, if somebody will compare speed of java and C for ray tracing, I'll answer System.exec("povray abc.pov")

Point of this benchmark is exactly to prove that you HAVE to escape from groovy to java for any performance-sensitive code. It also proves that in scala you can in many cases write your algorithms fully in scala, without having to switch language in hot spots.

Said that, benchmark is completely horrible. There is no time for jvm to warm up in case of java and scala - array is too small and there is no repetition of test.

I have done some slightly more scientific tests for array of 20000000 ints and results are (jdk 1.6.0_11 server, scala 2.8.0)

Why is the groovy code the only one not using strong typing? It's possible in groovy as well you know. Surelly that should help. After, I know scala optimize the compilation of the primitive operations, so no surprise there. All this benchmark shows is that to run programs on Integer fast, stick to primitive type, in java for example. but it's not groovy playground.

To all the people commenting about bad groovy code, lack of static types and using BigDecimal.

Do you really think it with those changes groovy will be anything close to java? Or just 300 times slower instead of 3000 times slower? (I'm taking about test after warming up).

I gave up after 1 hour of running groovy code, so lower bound is 3000 times slower with original code.

For smaller test with 1M elements, java is around 65ms, old groovy 20 seconds, groovy with explicit types (I have put them everywhere possible) 10 seconds.

Which means that groovy is still around 300 times slower - exactly the number I was getting in my previous tests few years ago.

So, now when primitive types are out of the picture, what will be the next excuse from groovy crowd? I'm still waiting for "Groovy cannot be slow because it is used in Fortune 500 companies" and "All applications are database bound anyway", which I got last time.

@Jan: You have to be careful with parsers, scala has very powerful capabilities here. For swing forms, I have to agree, groovy is currently state of the art ;)And to be honest, people ARE arguing with the fact that number crunching in pure groovy is slow. I have heard argument that as groovy is compiled to bytecode it is same fast as java, for example...

@Arouse: Groovy is superset of java from point of syntax, but same code runs considerably slower because of various extensions which have to be supported. If by superset you mean possibility of easily escaping into normal java, then it doesn't matter, because the point of the test was to see if you have to escape for performance critical code.

Sorting integers is easy to proof (and people have found few errors here). But when you will start writing graph processing algorithms and you end up with O(n^3) algorithm somewhere, you may want to know if C is 1 or 1000.

On top of that, it is better to talk about platform performance BEFORE writing 200k lines. I have seen systems written with "we will fix performance when it is ready" mindset, which were so broken in design performance-wise that they had to be rewritten from scratch.

You can avoid some mistakes if you are aware of basic facts from very start. Do not use groovy for scripting if performance matters. If you need to compute 1000 exotic option prices per second, your scripting language will be better faster than slower (and java things like Janino, thanks to the Hotspot, are way faster than any interpreted C++ stuff). And when you have to keep below 10kb of garbage for end-to-end processing, you better choose language where you can easily control amount of fluff.

I'm living a code base right now that could be 1/10th to 1/20th of the size if it was written in Groovy. Performance is a big issue - but guess what? They are a startup and the client constantly wants enhancements. The enhancements take forever because the code base is so large and complex. They nailed the premature optimization anti-pattern. They are barely into production and afraid to make changes.

If I'm looking at 200k code base, the first thing I'm thinking is how can make it smaller so its easier to manage change. I've never been on a project where changing requirements wasn't the dominant challenge for developers working a project.

There are things that you can do in Groovy and other languages that support a meta-object protocol that you simply can't do with languages like Java. Just because Strachen doesn't see the value in a MOP doesn't mean its value isn't real.

Finally, I am amazed how people who have never used Groovy on a real project (but are experts because they wrote 50 lines of benchmark code) dismiss "dropping into Java" as somehow not valid for Groovy code. It is not a knock on Groovy - it is a feature.

Those of us who have used Groovy on real projects have seen that a developer gets a great combination of features. I get a really expressive and compact syntax, dynamic behavior if I want it and its trivial to drop down into Java when necessary.

This post raised quite more of a discussion than I thought it would. I mean, it doesn't really tell anything new and it basically says "Look, I'm using the wrong tool for the job ... and it breaks!". I said that in the post! So I don't understand some of the comments, where people are getting really serious about this.

Groovy really shines in some areas, and it does not in some other areas. That's what also Guillaume Laforge says, but some people really want to use Groovy for everything. To those people the numbers above must look like a bug report.

I'm not saying that groovy is bad choice in above cases, just that ease of reaction to changing requirements is not always the highest priority.

As far as 'experience' is concerned, my commercial adventure with groovy started when I had to choose scripting language for exotic derivatives pricing. While hardcore computations were done in optimized C++, there was considerable amount of glue code around (with some less trivial stuff like modifications of volatility surfaces on the fly during pricing etc). I have considered some kind of DSL based on groovy for that - and guess what, groovy WAS slowest part of entire chain. Basically, it was too slow to be useful as a scripting language. Calling java is not going to help here - because it is exactly java/C++ which is calling 'scripting language', sometimes even inside a tight loops.

Wrong tool for the job ? Yes. We have ended up with using variation based on janino (some preprocessing done for easier usage), which gave us full java speed, with hotspot being able to really optimize the stuff.

There are cases where speed of the language ITSELF is important and saying 'jump to java in case of trouble' is not going to cut it. This is why it is nice to know how fast is the language on it's own. This is why I'm so angry when I see people comparing groovy to java speed by doing 20 seconds blocking query to database and saying it is taking 20.0001s in java and 20.01s seconds in groovy and saying that groovy is just fraction of percent slower than java in "all interesting use cases".

I AM using groovy for scripting (especially for things which are touching both filesystem and database) - and performance is for sure not any issue. I would have no problems with using it for any 'normal' web-based application or some DB-based reporting thingy. Still, when you have to process tens of thousands of updates per second on single machine and have to fight for every few bytes of garbage so gc won't become a pain, it is good to know that groovy is not a way to go... at least before Groovy++ came into picture - I'm looking at it closely. On the other hand, I know about scala now, so next similar try will be probably scala-based...

I think that is the natural way to handle logging in Scala. I will widen my trait now to incorporate the standard slf4j interface as yours does. Check out the use of by-name params and closures in my post.

I've been doing Project Euler problems in Groovy and have found that Java is around 6-8x faster than Groovy when using built-ins i.e. when Groovy is mostly used to glue Java object together, 40-60x faster when doing "general" programming and several hundred times faster when doing computationally intensive work (as Java allows native ints without all the autoboxing that goes on in Groovy). The "syntactical sugar" features of Groovy are quite expensive in runtime performance.

Obviously these are coarse generalizations, and the usage in PE problems is somewhat specialized, but you can find some more details on my blog: http://keyzero.wordpress.com

Note though that Groovy quite scores well when compared to other dynamic scripting languages such as Ruby, JRuby and Python.

What I want to know is why I should care? I mean, not to say that what youve got to say isnt important, but I mean, its so generic. Everyones talking about this man. Give us something more, something that we can get behind so we can feel as passionately about it as you do.

This is a WEAK groovy test! First no one would use 'def' and second, you aren't using groovy functions. You don't even understand how to write Groovy; you are deliberately writing your groovy test to FAIL!