When I created a little 128 bit library, I did a lots of benchmarks. So little results about Java5 beta. ~_^

client doesn't remove array boundary checks, or it will do it much less efficiently than server.client doesn't inline even getters, setters. server have strange behaviour with Xcompile / Xbatch. It looks like something is running on background and slowing first few instructions in the method.

Do you have more experience with differences between client and server, int terms of code generation?

Basically Java6 would need a nice library with multiple compilation level library, and transfer a lot of features from server to client. Namelly function call inlining, and array boundary check removing.

client doesn't remove array boundary checks, or it will do it much less efficiently than server.client doesn't inline even getters, setters.

Thats right - client can only inline non-virtual methods because otherwise it would require teh ability to de-optimize code only server is able to do.Furthermore it does not remove array-bound-checks since it does not help a lot for common applications (read: swing based client programs)

Quote

Basically Java6 would need a nice library with multiple compilation level library, and transfer a lot of features from server to client. Namelly function call inlining, and array boundary check removing.

And why should Java6 "need" this *wonder*? The fact why its not in client is to keep jvm size smaller (read: footprint) and keep compile-time down - I would lough if you would sell your customers a program which would have function inlineing and bounds-check-removal but would take ages to start up. Btw the source is downloadable - grab yours and implement it - I think sun would be very happy about your contribution.

Btw. multiple compilation modes are planned for Dolphin (java-7.0), where methods are first compiled with client and later width server optimizations as IBM's jvm currently does (also does not help a lot).Personally I think a JIT cache would be more usefule (since almost all optimizations of client-jvm can be shared unlike server-genated code).

AIUI the intent for the next release is to folds client back into server so you have a single compiler that does multi-stage compilation. The first pass woudl be m,roe or less euivalent to client today, it would then go back and do the server stuff on hotspots.

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

AIUI the intent for the next release is to folds client back into server so you have a single compiler that does multi-stage compilation. The first pass woudl be m,roe or less euivalent to client today, it would then go back and do the server stuff on hotspots.

Oh coolness! If a JIT cache would be added too, we'd have a perfect VM for games

Quote

And why should Java6 "need" this *wonder*?

Since we're all doing game stuff, most of us would benefit from a good optimizing JVM. I also have the feeling the client vm is used a lot more for games than for Swing stuff.For example on the current Client VM, my own project JEmu2 runs quite slow, while on the server it performs more than twice as fast. Even on the ancient MS VM, it performs about 30-40% better. On the IBM, it performs about as good as the Sun server VM (although last time I checked that was 1.3 IBM vs 1.4 Sun server), but without the start-up sluggishness of the Sun VM.The bad performance of JEmu2 on the client I mostly blame on the lack of bounds check removal but there might be more reasons. The client VM is quite ok for swing stuff, but it doesn't seem very fast for games.

The 2-stage VM would do far more than just optimise games. Swing is faster, Eclipse would be faster, everything would just be faster. All those promises and all those nerds with statements containing the words "theoretically" and "potentially" and "C++" will finally, after 10 years, be a reality instead of something that people s**** about.

And 2-stage will more or less render JIT cacheing totally redundant. Trust me on that one. The client VM compiles so fast that it'll be far faster than a cache and then the whole point of the server VM was that it only bothered with genuine hotspots anyway. A few commandline parameter tweaks to tune it and you're sorted.

Btw the source is downloadable - grab yours and implement it - I think sun would be very happy about your contribution.

Btw. multiple compilation modes are planned for Dolphin (java-7.0), where methods are first compiled with client and later width server optimizations as IBM's jvm currently does (also does not help a lot).Personally I think a JIT cache would be more usefule (since almost all optimizations of client-jvm can be shared unlike server-genated code).

It's not exactly difficult to cache pieces of hotspot optimalized code.The JIT cache would be nice, but some kind of caching frequently used classes compilled in simillary manner like -Xcompile might be even better, especially if they would be prepared for fast read.Of course how much it would be slowed by code verification and how nice would it be towards shared VM is another question. BTW How would shared VM implement firewall between applications?

As for your recommendation to try to compile source code. I'm on dial up. While I could download Java SDK in 2-3 hours, I don't have enough money to download that source code. If I would have broadband would depend on if my neighbourds would alow me to move a simple cabel throught stairs. It looks like I would need to wait a few weeks - months. Not tme mention that I'm currently working on addition of 128 bit numbers to the Java.

There seems to be quite a bit of misinformation on this thread but I'll just mention that the Java HotSpot client compiler has supported deoptimization and inlining through virtual calls via class hierarchy analysis since 1.4.

And 2-stage will more or less render JIT cacheing totally redundant. Trust me on that one. The client VM compiles so fast that it'll be far faster than a cache and then the whole point of the server VM was that it only bothered with genuine hotspots anyway. A few commandline parameter tweaks to tune it and you're sorted.

I am not so optimistic when looking to large java applications. They all have the same problem when running on client-jvm: a very large codebase (swing) with almost flat profile - and a very strict repsonsivness need.And in my eyes the biggest problem is the compilation threshhold - which has to be awaited every time even with n-stage compilation. If a lot of common compiled methods could be cached in a way so that they are useable from other calees too (no very hard optimizations) most of the code could run on about 80% of client-code performance while more critical (real hotspots) could be given to the 2-stage compilation engine. This would also have the benefit that the first 1500 (or whatever)invokations would run already compiled code.

There seems to be quite a bit of misinformation on this thread but I'll just mention that the Java HotSpot client compiler has supported deoptimization and inlining through virtual calls via class hierarchy analysis since 1.4.

Ken,

It would be good for you to, just once, give the misinformation list. As a genuine insider into whats going on inside (Ken's part of the Hotspot VM team folks) I think everyone would appreciate it and benefit from it.

Jeff

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

There seems to be quite a bit of misinformation on this thread but I'll just mention that the Java HotSpot client compiler has supported deoptimization and inlining through virtual calls via class hierarchy analysis since 1.4.

Ken,

It would be good for you to, just once, give the misinformation list. As a genuine insider into whats going on inside (Ken's part of the Hotspot VM team folks) I think everyone would appreciate it and benefit from it.

Starting from the beginning of the thread: the client JVM has the same inlining capabilities as the server JVM (i.e., class hierarchy-based inlining with support for dynamic deoptimization). It is true that the client compiler does not currently eliminate range checks. The server compiler maintains integer ranges for values and should be able to perform the optimization of loading only an integer value, but I think it currently does not. See src/share/vm/opto/mulnode.cpp in the Mustang HotSpot workspace and feel free to file an RFE or even to implement and contribute this. Implementing a cache for dynamically compiled code is pretty complicated and as far as I've heard BEA recently moved away from caching compiled code in JRockit which would be support for this assertion. Multi-stage compilation seems to be a better and more maintainable approach. I'm surprised the 1.1-era Microsoft VM would be faster than the HoSpot client JVM at any pure computational tasks and wonder whether any 1.1-style library calls are being used which are simply better optimized in the Microsoft stack. If you have a concrete test case which runs faster on the MSVM please provide it.

For general information about HotSpot please see the white papers at http://java.sun.com/products/hotspot/ . For release-specific information please see the documentation for each individual release. There are also several presentations on various portions of HotSpot in archived JavaOne talks at http://java.sun.com/javaone/ .

Implementing a cache for dynamically compiled code is pretty complicated and as far as I've heard BEA recently moved away from caching compiled code in JRockit which would be support for this assertion. Multi-stage compilation seems to be a better and more maintainable approach.

Well, multi-stage compilation will do the following:* More time will go into compilation* Responsivness like client, but performance like server

However it won't solve the problem larger swing apps today suffer:Performance immediatly after startup. Its not really funny that I need to tell my customer that he needs to drag the table-header a few times arraound till it wont feel sluggish or why scrolling these or those panels is slow after start.It least saving some profiling-data would make the whole scenario much better.

Its really hard to try to share/cache code that has been optimized with closed-world optimizations, however generic compiled code can be shared easily and replaced by more optimized methods if required.The approche Bea used was to cache fully optimized code - which is hard of course, GIJ is able to cache "JIT"-code and it works quite well (~ speed of client-jvm).

Implementing a cache for dynamically compiled code is pretty complicated and as far as I've heard BEA recently moved away from caching compiled code in JRockit which would be support for this assertion. Multi-stage compilation seems to be a better and more maintainable approach.

Well, multi-stage compilation will do the following:* More time will go into compilation* Responsivness like client, but performance like server

However it won't solve the problem larger swing apps today suffer:Performance immediatly after startup. Its not really funny that I need to tell my customer that he needs to drag the table-header a few times arraound till it wont feel sluggish or why scrolling these or those panels is slow after start.

Is this with the client VM??

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

I have a listener on a table which only sets 6 JTextField's Text and it feels sluggish on a 800Mht Athlon the first 50 Selections.Another example is that reordering columns in a JTable really feels slugish (my TableRenderer is pretty optimized) but after some moving arround its really fast.

I have a listener on a table which only sets 6 JTextField's Text and it feels sluggish on a 800Mht Athlon the first 50 Selections.Another example is that reordering columns in a JTable really feels slugish (my TableRenderer is pretty optimized) but after some moving arround its really fast.

Thats very odd, have you profiled it?

A good way to see if its somehow not getting compiled soon enough is to foce everything to compile at start-up (I believe the option is -Xcompile).

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

A good way to see if its somehow not getting compiled soon enough is to foce everything to compile at start-up (I believe the option is -Xcompile).

I think it makes sense that GUI apps feel this way. The compiler appears to be designed to optimized when a method has been executed a certain number of times, and also is taking a certain proportion of the execution time. If you consider methods that are responding to UI events , such as the user dragging a table heading or scrolling, those methods are not executed in tight loops. So it stands to reason that they only hit the compile threshold after a bit of user interaction... so the user's initial experience is that the application is slow.

The sad thing is that the CPU is often idle while waiting for user input, too bad there wasn't a way to take advantage of that idle time to do some basic compiling,.. but I think that would involve mind reading

Just compile the method with the most invocations so far. If all that's left is invocations == 1, do nothing.If you've got spare CPU and spare memory there's no reason not to... after all Hotspot could always decompile stuff it thinks isn't relevant any more couldn't it? (Another optimisation)

But there are other threads that are running in the background besides the compiler thread. GC, user threads, interpreter, etc. It would require a lot of effort to figure out that *nothing* is going on, and then force compiles. One idea that I've had (it was my master's thesis actually) was to compile methods as logical units rather than discrete ones. It required keeping track of methods that were invocated together in the interpreter, and when one of those methods was compiled, the whole logical unit was compiled. Startup suffered, but the advantage was more things got compiled, and GUI apps improved the most (I think it was something like 10% on SwingMark).

I know that particular tidbit... but those threads are not sensitive to actual computer idle time. If they detected low CPU usage they could conceivably get to work compiling things pre-emptively rather than waiting for some invocation threshold to be reached. Just a thought experiment. Might be interesting to try it and I suspect not too much work to try.

It's the same problem, as you have to don't let your timer to block your game. Compilation could be planed ahead, and some sheduler could coordinate actions between high priority tasks.

Fast on load precompilation, then extensive on background compilation with small time slice woud probably work best, especially if results of optimalizations would be saved. (Just wait for all that new bugs, compiler optimalized out graphic surface... it killed events from system... jerky jerky jerky you like jerky for food? ~_^)

I would love to something like the follwoing, however SUN's engineers will say its not needed at all (as they said swing isn't slow and showed us benchmarsk which pointed out how fast JButtons can be instantiated ;-) ):

A virtual machine that:

- Uses the interopreter on code that has not been cached

- Generates cacheable code for methods that are called more than x/y times and not in cache. This isn't as hard as all engineers do, even gcj is able to do that. Just remove inlineing and some other tricky optimizations away from this phase and keep code small- Execution would be still about 3-?x faster tha interpreter-only. This would help especially large applications (like swing based desktop apps).

- Optimizes the real hotspots using the server-engine and replace the cached-compiled-code by the highly optimized new code. In this stage also optimizations could be done that would be hard to cache (espace analysis, inlining of virtual functions, ....).

However for Mustang we may not expect major improvements as it looks for now, its just 9 months to the planned release date and I think they want get it stable. As so many improvements 2-phase compilation (which does not solve the slow-startup/slow AFTER startup problem at all) has been directed to Dolphin and how knows wether it will be realized then. From this point of view Mustang has not had really impressive improvements in terms of the runtime :-(

Just compile the method with the most invocations so far. If all that's left is invocations == 1, do nothing.

Problem is, compiling method which has not be run many times so far will not provide all optimalizations (as not enough runtime data is present, which is gathered inside interpreter runs). So only thing you could do in background is 'generic' compilation of such methods, with profiling instructions still embedded inside, to recompile them later with full optimalization. Which again hits the need for tiered compilation

Actually I was interested in something Azeem once claimed in here which was that by setting -XX:CompileThreshold=500 we'd get suboptimal compilation in the server. However it also occurred to me that he might be incorrect Our games tend to do one thing over and over again, and once they've done that thing, they pretty much never do anything else. It should only take a few frames of a game loop to determine just about every possible code path in one of my games.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org