Now that I've got your attention, I hope you'll stick around for the real topic.

I developed a fibonacci heap (based on the most venerable text "Introduction to Algorithms"), and it gets what you might normally think of as pretty decent performance. An average insert takes about 4 microseconds and an average extractMin takes about 12 microseconds. Compared to some other (good) Java fibonacci heaps out there, mine is about 25% faster.

The truth, though, is that performance is actually very slow. Our algorithm performs roughly 1 million inserts and extractMins and we expect to be able to do it in under a second. Obviously, that's not possible with the current timings. I did nearly a line for line translation of my fibonacci heap from Java to C++ (not optimizing for C++), and my C++ fibonacci heap gets performance that is nearly ten times faster than its Java counterpart. Inserts happen in 0.26 microseconds and extractMins take as long from 0.6 to 3 microseconds depending upon the size of the heap. (Realistically it'll be in the 0.6 microsecond range for us, because we're going to have fairly small heap sizes).

So the big question in my mind is WHY OH WHY? I've poured over and over my Java code, and I just can't see anything that can be done to improve the performance. The scary thing is that the algorithm actually requires a significant amount of heap allocation and doesn't do comparatively many array accesses, so it's actually already playing to Java's strengths.

I tested this with Sun's 1.4.1 and 1.4.2 VMs with and without the server options. I've tried with different heap sizes and different garbage collection options and the numbers I give above are the best I could get. I compiled my C++ code under VC++ 7.1 with default optimizations.

So, I challenge specifically Jeff or Chris to examine my heap and either a) explain exactly how it can be re-written to get the same performance or b) get Sun's vm team to fix the vm so that we can get the same kind of performance.

I received permission to post the Java code. Let's just say that the C++ code looks mostly the same. It goes without saying that my company reserves the copyright to this code, so you should avoid copying it.

A couple of notes. 1) I extracted comments - code was too long for the board 2) Don't worry about the asserts 3) Performance numbers came from running inserts and extractMins a million times and taking average. Best times are with -server and sufficient heap (i.e. 256M). Tests took place on a 2.4GHZ PIV with 512M.

It's JDK1.4+ code. You can compile and run this with Sun's freely available 1.2 adding generics prototype which is just an add on to the 1.4 compiler. Alternatively, you can join the CAP program and compile and run this with the 1.5 VM. Finally, you could just take the 60 seconds to get rid of the parameterization. It doesn't have any affect on the running time, because the heap doesn't mess with the values.

I would, but your dyndns account seems broken as it won't resolve reyelts.dyndns.org

Lol. My server just started suffering spontaneous rebooting fits, so the domain is down for a few days while I put together a new box. The project is also hosted at sourceforge - http://sf.net/projects/jace

The truth, though, is that performance is actually very slow. Our algorithm performs roughly 1 million inserts and extractMins and we expect to be able to do it in under a second. Obviously, that's not possible with the current timings. I did nearly a line for line translation of my fibonacci heap from Java to C++ (not optimizing for C++),

Hmm. It may still be code that is better suited to C++, where a re-workign of the algorithym might make the Java fly.

But before we even go there, lets ask some basic Java microbenchamrking questions:

(1) Did you run this with the -Xcompile flag to force it to compile all the methods right away?

(2) How long did yo urun thsi etst for? even with -Xcompile you are liley I think to pay some start-up penalty. When benchmarking on the JDK tema we always did the foloowing:(a) NEVER ran abench mark that ran for less then about 60 seconds.(b) ALWAYS ran the benchmark 5 times in sucession fro mwithin the same Java program.(c) THROUGH OUT the entire first run.

Unless you are doing these things its quite possible that you aren't warming up the VM and/or allowing the one-time cost of compilation to have an unreasonably large effect on your final numbers.

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

Toby,How did you measure exactly ?1 Million times insert/extract or 1 Million times for ( int i = 0; i < ints.length; ++i )insert()while ( heap.size() > 0 )extract()?or something else ?The results seem to be pretty slow indeed. Did you try without the generics because the generics prototype may be slowing down everything ? There is nothing suspect in the code AFAIK.

Princec,You are measuring also the creation of 1000000 node objects, you should move the creation of those nodes (and the associated strings) out of the insert() loop.Also, my understanding is that the heaps remain small but yours grows to 1M nodes.Toby, Have you tried the profiler (-Xrunhprof:cpu=times) to measure the time spent in your methods ?With my Ahtlon 1.4Ghz and JVM 1.4.2-beta-b19, I get (no optimization options) :

Actually I do not know if Toby's figures include node creation or not. But, if you include the creation of the nodes, most of the time is spent in there and not in the insert() method.

1) He should include node creation. It's something that is a required part of the insert. Most data structures (like HashMap) hide this inside of the insert, but with a fibonacci heap you have to give that Node back to the client anyway, so I just make the allocation the responsibility of the client.

2) The heap instance creation is just noise compared to a million inserts.

3) The String.valueOf really shouldn't be part of that though.

I decided to break down and use an actual commercial profiler to see what it would tell me. JProfiler tells me that, on average, the inserts are taking 4 microseconds, and the extract mins are taking 70 microseconds. (I believe the maximum heap size has changed significantly, since some changes we made to the algorithm). It also says that the fibonacci heap is responsible for 80% of the running time of the main algorithm.

When I get finished translating the algorithm to C++ (soon), I'll run some statistics on it and see what kind of results I get.

I suspect that the fibonacci heap (or at least your implementation of it) is not well suited to Java. Try implementing a basic binary heap instead. The array used in the consolidate method is probably very cheap in C++ (stack allocated), but costs quite a bit more in Java.

microscopically, classes in java are expensive. if you can come up with a tricky way to store your data in an array and then just change integer pointers into that array you will get very high performance.

Ha! I remember the time when you had to also "avoid methods" for optimal performance, particularly methods in other classes or objects; I had a fractal generator that showed a considerable speed boost when all several thousand lines of code were placed in one class - I can't remember if I had the courage to manually inline the few recursive methods too (I don't think so; it would have made the code seriously painful to change...).

Thankfully there's normally no difference now between method calls to the current object and to other objects...

A waste of time. These days the JIT compiler will do this without the need for hints. Objects aren't slow anymore either. In fact some techniques you might use to avoid them are more expensive than the code they were intended to improve. You will often be able to access a field in an object faster than an arbitrary element of an array (i.e. an access where the JIT can't guarantee is within the array bounds and thus must emit code to check the bounds).

Current JITs are able to optimize even instance virtual calls perfectly in most cases. Anyway, EVEN if you don't believe jit to do it for you, you need only static modified - final one is superflous (all static methods are 'final').

C is fastest doing everything through array refernces. In Java non-linear array referenes are expensive. C is slow generally at recursion, Java recursion can be blindingly fast. (Faster then a local array based stack in fact.)

As already mentioned calling through a class or an interface is no different then calling through a static method. This has been the case since at least JDK1.2

An old but still good introduction to some of the differences between how C and Java code execute can be found here. Java VMs have actually imrpoved at running "c-like" code in some ways sicne this article was written but the base lessons are still pretty accurate:

I suspect that the fibonacci heap (or at least your implementation of it) is not well suited to Java. Try implementing a basic binary heap instead. The array used in the consolidate method is probably very cheap in C++ (stack allocated), but costs quite a bit more in Java.

If you can spot "problems" in my implementation, speak up now. That's why I went through the trouble of gaining permission to post my code in the first place.

Telling me to implement a d-heap is just skirting the issue and is utterly pointless as that data structure doesn't give me the algorithmic performance characteristics I need.

The array used in the consolidate method is probably very cheap in C++ (stack allocated), but costs quite a bit more in Java.

You can't allocate the array on the stack unless you use something platform dependent like alloca. I pull the heap allocation out from the function call anyway. I have to memset the array in every single call to consolidate, so it's not exactly free.

I could do the same thing in Java, but it's actually more expensive to clear the array than it is to heap allocate it from scratch. Of course, its kind of hard to measure the resulting effect on garbage collection.

Just thought I'd let people know that I've finished doing the port and the results are dramatic. On a small size problem set, the C++ code is about five times faster. The gap between performance in the C++ and Java code increases rapidly (perhaps geometrically?) as the size of the data set increases. (We work with data sets ranging from hundreds of thousands of objects to about 100 million). Unfortunately, I can't run the Java program with the largest data sets, because the virtual machine can't allocate over 1.7G of heap space.

The algorithm used in both programs is identical. Of course, both programs try to take advantage of any naturally well performing features their respective languages carry.

Are you trying to say that the current crop of VMs perform tail-recursion optimization? I've heard people (those who do things like implement Scheme on top of the JVM) complaining of the exact opposite - that it doesn't happen.

Aha, now if the problem is scaling geometrically on Java and not in C it points to something else that's scaling wrongly rather than your direct interpretation of the port. Could it be GC?

Ask yourself this though before losing more hair - would you rather have an implementation with guaranteed safe GC and null pointer detection that runs slower, or a super-fast but entirely unsafe version of the algorithm?

As this is a serverside task your doing IIRC, you're probably better off trading performance for reliability and investing your money in faster processors for the server. Programmer time is very expensive indeed compared to CPU upgrades!

This is, of course, no real excuse for the algorithm being so slow in Java. What parameters are you running your benchmark with?

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org