How do you configure your runtine and the compiler to get a maximum of performance ?

At first, I guess one should use the comiler flag g:none, to disable the generation of debugging info. Then one has to choose the runtime: AFAIK: sever should starts up slower (as the client), but tends be faster afterwards - right ?

Are there any other important flags (like Xmx,.) or configurations, that have a recongizable affect on the performance ?

The reason, why I am asking is that due to my experience the byte code comiler and even more important the JIT, do very few optimzations. In my test cases the compiler doesn't inline properties often (especially when thaemethods override an abstract one) and auto boxing like the following aren't optimized well:(any ideas why?)

The Compiler does NPO optimization, regardless of what flags you set. Thats on purpose. Trying to early-optimzie justmakes the improtant optimization period-- the run-time optimization, harder on the VM.

The VM (JIT really is an outdated term as its far more then just a JIT these days) does *extensive* optimization. More in fact that any C or C++ compiler I know of. If you are using the client VM then don't as it leave s a number of important optimizatiosn on the table. User sevrer and "warm-up" your code befoire engaging the user.

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

as I reapeted the tests several times (see loop in the main method), I am aksing my how long such a "warm-up" phase can take? The amount is probably a ratio between the alogorithm execution peeriod and the number of loops, right ?

And for c/c++ comparison: my experiance is that Java code is almost as fast, if comparing pure array of value types and arithmetic oprations, unfortatunately, the vm doesn't optimize more object oriented code that well. Of coruse, there is quite an effort on optimizing c++ code, since inlining and other flags (like inline depth for recursive methods,..) have to be set manually. Furthermore they are static and can't change during runtime, which is a great potential of using a (java) vm. On the other hand, I believe that SUN's JDK / JRE still not capable of 'deferred evaluation' in order to simulate the compile time functionality of c++ expression templates (see Just when you thought your little language was safe: ``Expression Templates'' in Java) - maybe JET can, but I don't have a license to test this.

It really depends on the JVM you are using when the methods are compiled but sun has a 1500 threshhold for client and a 10000 for server, but you can also set this values manually.However really expensive optimizations are only done by the server-vm, like inlining of virtual-methods and so on...

I personally think its absolute nonsence to warm-up code by hand - its just a work-arround of some design-weaks and in my mind absolutly stupid.A JIT-cache which would cache methods on disk (as JRockIT is able to do) would be much more elegant, furthermore profile-informations of previous runs could be cached on disk which would especially help short-running applications a lot!

I can see why a naive JIT cache wouldn't work for Hotspot (as the precise runtime compiled state actually changes over time) but it would be a nice thing to serialize the state of a JVM and just load it in one big file load operation

I can see why a naive JIT cache wouldn't work for Hotspot (as the precise runtime compiled state actually changes over time) but it would be a nice thing to serialize the state of a JVM and just load it in one big file load operation

Well, not for all & everything - as far as I know a JIT cache could even hinder more advanced optimizations like inlining or virtual method optimizations but for today's gui applications (swing) you have a lot for code and an almost flat profile - except some hotspots.So what could be done would be to compile the flat-code with optimizing for code-size withought any problematic optimizations and re-optimize the hotspot parts.As far as I know JRockIT has a experimental feature doing exactly this.

However I am not an jvm specialist and I cannot even think about giving sun/hotspot guys tipps howto do their work. Both jvms (server/client) are impressive work!

lg Clemens

PS: Does anybody know for which release 2-phase compilation is scheduled? (mustang or dolphin)

Okay so pardon me my periodic rant , but i need to explain to thsi newb his primary error.

Your primary error is this: You wrote a microbenchmark.

Hotspot is brillaint at optimizing real code. But microbenchmarks don't execute like real code. Ergo you will not get meaningful results from microbenchmarks. The simpler compilers and optimizers in you C compielr may actually perform better because they ARENT tuned for real code in the same way hotspot is.

What micrbenchmarks do most often is turn up intersting corners in the work hotspot does. I havent analyzed your code in detail because, to be honest, I just don't really have the "umph" to do that. In the scores of microbenchmarks Ive seen and analyzed n my time in the JDK eprformance tuning team, and then afterward in this community, the answer was almsot always that the benchmark was doing something that real code woudlnt that was biasing the benchmark.

IF you have a good understanding of the ratehr complex things the system is doing under the hood then certian very specific and carefully written microbenchmarks can produce useful results. The vast majority hpwever just serve to illustrate one part or another of how the system is designed to eat real code well and simplsitic benchmarks poorly.

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

I personally think the mircobenchmarking issue is often used just as a lame excuse for the JVM

A have tested code like the mentioned expression templates in 'real world' application, more precise I exchanged the vertex skinning code in my character animation system to use dereffered evaluation and the JVM definately cannot handle these. The result is a huge drop in FPS.

Furthermore it a serious problem with microbenchmarks not beeing meaningful. They should, at least if the test starts after executing the code before for a given time and every computed results gets used afterwards (e.g. printing the sum). Otherwise it is a pain in the XXX to write performance cirtical code that does not depend on complexity.

As far as I can see, your benchmark primarily shows how a design error on your part negatively affects performance (i.e. using autoboxing where you should not).

I don't think Jeff suggests that the JVM can only optimize complex code well, he's just rehashing the (IMHO still valid) general point that microbenchmarks often show meaningless results if not done correctly.

As far as I can see, your benchmark primarily shows how a design error on your part negatively affects performance (i.e. using autoboxing where you should not).

That's my point: auto-(un-)boxing should never affect the performance. Since the wrapper classes are final and there are only getters [ xxxValue() ] they could be replaced with primitive types in naitve code and all other methods could be inlined. So IMHO this isn't a design error, ths is a limitation of SUN's JIT.

As far as I can see, your benchmark primarily shows how a design error on your part negatively affects performance (i.e. using autoboxing where you should not).

That's my point: auto-(un-)boxing should never affect the performance. Since the wrapper classes are final and there are only getters [ xxxValue() ] they could be replaced with primitive types in naitve code and all other methods could be inlined. So IMHO this isn't a design error, ths is a limitation of SUN's JIT.

Ridiculous.

The whole point of auto-boxing is that primitive types can't fit into the existing code. At least not without a LOTof special-case optimizations in the JVM that would be extremely difficult to deal with. You can't call methods on primitive types for example... all the code that calls equals() .. comparators would have to be magically re-written by the VM.. it makes no sense.

You might as well say that the compiler should figure out when you shoudl have used a linked list instead of an array and magically change it behind the scenes for you. Sure some day compilers might be advanced enough to do that sort of thing... but they simply aren't yet.

Once an instance of a wrapper class is created, the values returned by its methods are constants because these classes are declared final (no overloading is possible). Furthermore a wrapper class created upon its primitive type is bound to this value, which is constant by definition (a value type cannot be changed, only copied). The evalution graph is as simple is it can be, just a 3-Node Chain:

PrimtiveType -> WrapperClass -> PrimitiveType

and the algorithm to remove nodes (here the frist two) has only to check whether the expression evaluated to its parent ist constant. More precise, in order to make sure an expression doesn't change:

1. step: test if the method depends on any variable1.1. if yes recursively start at 1. to evaluate these variables1.2. if no, make sure that both, the method and variable(s), are declared final an therefore cannot change1.2.1 if everything is final remove this expression (optimization)1.2.2. if not, heavy runtime analysis may needed to device if the expression can change or it might even not be possible in multi-threaded environments, but these cases should not occur using the basic wrapper classes

...The evalution graph is as simple is it can be, just a 3-Node Chain:

PrimtiveType -> WrapperClass -> PrimitiveType

and the algorithm to remove nodes (here the frist two) has only to check whether the expression evaluated to its parent ist constant. More precise, in order to make sure an expression doesn't change:

1. step: test if the method depends on any variable1.1. if yes recursively start at 1. to evaluate these variables1.2. if no, make sure that both, the method and variable(s), are declared final an therefore cannot change1.2.1 if everything is final remove this expression (optimization)1.2.2. if not, heavy runtime analysis may needed to device if the expression can change or it might even not be possible in multi-threaded environments, but these cases should not occur using the basic wrapper classes

Nothing in the code you posted was declared 'final', though I doubt that will matter much - maybe the server compiler will do something with it.

The thing is as is pointed out above, this is a very trivial example. In "real" code 'foo' is likely to be called from many other places, some of which might actually need a Float class, not a float primitive.

Also, you went to the trouble to explicitly tell the compiler that you wanted 'foo' to return a Float object, not a float primitive. If you simply use 'bar' everywhere, and eliminate 'foo' entirely then the autoboxing will happen when it is truely needed, that is when the return value of bar *needs* to be a Float it will be converted to one. By writing the code as you did, you told the compiler that the return value of 'foo' *needs* to be a Float so the value is boxed prematurely. I consider that simply poor code, not a poor compiler... though you could argue either way.

Nothing in the code you posted was declared 'final', though I doubt that will matter much - maybe the server compiler will do something with it.

I'm talking about the wrapper classes (and indirectly their methods) as I mentioned the 'final' thing, which is sufficient for desiered optimizing. But you are wrong anyway, since you didn't recognized the final before the float array, which is declared in the Test class.

The thing is as is pointed out above, this is a very trivial example. In "real" code 'foo' is likely to be called from many other places, some of which might actually need a Float class, not a float primitive.

Again, this kind of optimization can performed at every wrapper class method call in the code, by performing a partial evaluation. Therefore it should fit into every 'real code'! The trick is that a wrapper could be seen as some kind of const pointer for a single primitive type: since it is impossible to change the pointer itself and the value at which it is pointing, it can be replaced it with the primitive value under all circumstances I can imagine.

Also, you went to the trouble to explicitly tell the compiler that you wanted 'foo' to return a Float object, not a float primitive. If you simply use 'bar' everywhere, and eliminate 'foo' entirely then the autoboxing will happen when it is truely needed, that is when the return value of bar *needs* to be a Float it will be converted to one. By writing the code as you did, you told the compiler that the return value of 'foo' *needs* to be a Float so the value is boxed prematurely. I consider that simply poor code, not a poor compiler... though you could argue either way.

As you did recognize, the code isn't real code. I wrote it only to see whether wrapper class conversion is handled efficiently. And just using the primtive type versions as you recommand, doesn't work with generics.

Again, this kind of optimization can performed at every wrapper class method call in the code, by performing a partial evaluation. Therefore it should fit into every 'real code'! The trick is that a wrapper could be seen as some kind of const pointer for a single primitive type: since it is impossible to change the pointer itself and the value at which it is pointing, it can be replaced it with the primitive value under all circumstances I can imagine.

Perhaps for methods that would be candidates for inlining - the process of inlining would eliminate the temporary float. That much makes sense. But I'm not sure it is "easy" to do more than that.. the code that wraps the primitive type is in a method that, at least sometimes, *does* need to autobox. Are you suggesting that the VM build some synthetic method that doesn't autobox and use it automatically in cases where it is just going to unbox anyway? Seems it is reaching to corner cases that are less worth persuing than other optimizations.

Quote

And just using the primtive type versions as you recommand, doesn't work with generics.

Perhaps for methods that would be candidates for inlining - the process of inlining would eliminate the temporary float. That much makes sense. But I'm not sure it is "easy" to do more than that.. the code that wraps the primitive type is in a method that, at least sometimes, *does* need to autobox. Are you suggesting that the VM build some synthetic method that doesn't autobox and use it automatically in cases where it is just going to unbox anyway?

I agree. Inlining is the best possible optimisation here and this is excalty the result of a (partial) evaluation form new Float(array).floatValue() to array using an algorithm like the one I mentioned above.

The whole thing I was wondering is that accoringly to my microbenchmark, there is no inlining performed. That's why I asked for flags or s.th. like that, because I remember doing a similiar test with the Java 1.4 version in which hotspot first occured with much better optimization results.

Meanwhile, I read an white paper about Excelsior's JET Technology and I can't wait to try the demo version, because it seems this JVM already performs all these optimization using partial evaluation. In favor of the SUN JVMs, one has to emphasize that they use Ahead-Of-Time compilation and therefore can use much more heavyweight/complex optimization techniques.

Seems it is reaching to corner cases that are less worth persuing than other optimizations.

That may be right. Unfortunatly, I haven't enough insight into the current JVM technology to confirm or negate that, but it makes sense to me. Moreover, different optimization techniques often conflict and one has to choose the one, which produces the best results over all.

Perhaps for methods that would be candidates for inlining - the process of inlining would eliminate the temporary float. That much makes sense. But I'm not sure it is "easy" to do more than that.. the code that wraps the primitive type is in a method that, at least sometimes, *does* need to autobox. Are you suggesting that the VM build some synthetic method that doesn't autobox and use it automatically in cases where it is just going to unbox anyway?

I agree. Inlining is the best possible optimisation here and this is excalty the result of a (partial) evaluation form new Float(array).floatValue() to array using an algorithm like the one I mentioned above.

The whole thing I was wondering is that accoringly to my microbenchmark, there is no inlining performed. That's why I asked for flags or s.th. like that, because I remember doing a similiar test with the Java 1.4 version in which hotspot first occured with much better optimization results.

Most likely either it wasnt sufficiently warmed up or you werent running server VM.

Microbenchmarks are most often misleading., Thats really the important take-away here. On real world apps we are seeing abouta 5% to 10% imporovemetn of performance in 1.5 or 1.4, and that again in 1.6 over 1.5

Quote

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

I personally think the mircobenchmarking issue is often used just as a lame excuse for the JVM

Then you dont know enough about either subject.

I'm sorry, Im really not trying to flame here, but that is a statement that illustrates fudnemental ignorance of the problem space.

Quote

A have tested code like the mentioned expression templates in 'real world' application, more precise I exchanged the vertex skinning code in my character animation system to use dereffered evaluation and the JVM definately cannot handle these. The result is a huge drop in FPS.

**sigh**

Show me the exact code and I guess I will once AGAIN go throuygh the execercise of explainaing to you why you hurt yourself.

Unless ofcourse SWP has already adaquately explained it.

Quote

Furthermore it a serious problem with microbenchmarks not beeing meaningful. They should, at least if the test starts after executing the code before for a given time and every computed results gets used afterwards (e.g. printing the sum). Otherwise it is a pain in the XXX to write performance cirtical code that does not depend on complexity.

The answer is well known.

Write clear, clean well encapsulated REAL code. Profile. Tune based on the profile.

A wise man said "never automate sharp objects". Frankly I think autoboxing is one of these cases. You push the bar low enough that any idiot can write code that compilew and runs and what you get is idioicly written code.

We've already seen this happen with some of the Java netowrking stuff which made it so easy to write networked code that anyone could do it-- including those who had no idea of what they wree really doing. The result was a lot of very badly performing network applications.

I have similar though slightly different issues with generics and eventually I may blog on both o those subjects and annoy quite a few people....

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

Meanwhile, I read an white paper about Excelsior's JET Technology and I can't wait to try the demo version, because it seems this JVM already performs all these optimization using partial evaluation. In favor of the SUN JVMs, one has to emphasize that they use Ahead-Of-Time compilation and therefore can use much more heavyweight/complex optimization techniques.

Flame bait.,

I'm not going to even rise to why this is nonsense.

Go try it, if it works for you then terrific.

The results of people arpound here have not born outthat it buys you anything beyond maybe a start-up time imrpovement and costs you inother places such as code bloat. You can find a whole lot of discussion of this if yousearch on "JET".

But hey, use whatever works for you.

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

...The evalution graph is as simple is it can be, just a 3-Node Chain:

PrimtiveType -> WrapperClass -> PrimitiveType

and the algorithm to remove nodes (here the frist two) has only to check whether the expression evaluated to its parent ist constant. More precise, in order to make sure an expression doesn't change:

1. step: test if the method depends on any variable1.1. if yes recursively start at 1. to evaluate these variables1.2. if no, make sure that both, the method and variable(s), are declared final an therefore cannot change1.2.1 if everything is final remove this expression (optimization)1.2.2. if not, heavy runtime analysis may needed to device if the expression can change or it might even not be possible in multi-threaded environments, but these cases should not occur using the basic wrapper classes

Nothing in the code you posted was declared 'final', though I doubt that will matter much - maybe the server compiler will do something with it.

The thing is as is pointed out above, this is a very trivial example. In "real" code 'foo' is likely to be called from many other places, some of which might actually need a Float class, not a float primitive.

Clearly the float needs to be put in an obejct box. IF you have a brilliant way to somehow know, even given Java's late binding, what primitive will actually be used as an object and need to be sued as an object without using the coercion inherentin parameter types I suggest you (1) write an example compiler that dpes this (2) write your thesis on it and (3) send both to the manager of the Hotspot team along with your resume.

Tell you what, you do (1) and (2) and Ill get you the name for (3)!

Now a totally seperate issue are languages that dont HAVE primtiive types. Where everything is an object, including your primitives. Smalltalk works this way. There are some optimizations you cna do within the VM of such a system to reduce the penalties and store primtiives without needing the wrappers BUT Java is not one of those languages. We DO have true primitives. This allows us to do somethign Smalltalk never cpould-- approach C speeds.

The price is that they are seperate syntactical constructus and always will be. Autoboxing just allows sloppily written code that doesnt really care about the penalties of conversion to pretend that they aren't. Its a coercion mechanism. Thats all.

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

A have tested code like the mentioned expression templates in 'real world' application, more precise I exchanged the vertex skinning code in my character animation system to use dereffered evaluation and the JVM definately cannot handle these. The result is a huge drop in FPS.

**sigh**

Show me the exact code and I guess I will once AGAIN go throuygh the execercise of explainaing to you why you hurt yourself.

oh what a nice offer you can find the extracted code here. The two different approaches can be found in the forjeff.VertexBlend class, more precisely the expresssion and the javaTran method. The algorithms are all based on expression templates article I mentioned before.Unfortunately, I can't upload the whole application due to licensing reasons: 1. I have no rights on the character (but if somone has a 3DSMAX 6 Model with physique or skin modifier attached I can use it with my exporter)2. The final license isn't decided yet. Since the extraction is suspected not to be part of the final applcaition I can simply put it under GPL, to show it here.Anyway, I believe it already consumes a serious amount of time to go through the extracted code.

Furthermore it a serious problem with microbenchmarks not beeing meaningful. They should, at least if the test starts after executing the code before for a given time and every computed results gets used afterwards (e.g. printing the sum). Otherwise it is a pain in the XXX to write performance cirtical code that does not depend on complexity.

The answer is well known.Write clear, clean well encapsulated REAL code. Profile. Tune based on the profile.

That's my point, going to a profilers output for everything comsumes much more time than performing a simple mricobenchmark. So if the later can't be used, production time is rising. Furthermore isn't it possible that code could be efficient in real scenario A but worse in real scenario B ?

Most likely either it wasnt sufficiently warmed up or you werent running server VM.

I used a warm-up phase about 5min. so if this isn't sufficient than java games got a big problem with a low FPS the first 5min playing a FPS I tested both client and server for comparison, but one thing I don't undestand is that server was barely faster than the client VM - that's why I asked for flags and that stuff..

finally the method I mentioned would only work with the generic way:

HasMap<String,Float> map = new HasMap<String,Float>();

since Float class is final s.th. like

HasMap<String,Float> map = new HasMap<String,? extends Float>();

cannot occur. (Bad exmaple I know, but I want to emphasize that you cannot put subclasses of Float into the map, which is important for the optimization).Furthermore assuming that Float can be compared with a (const) float* since it is a reference type and these are like pointers (NullPointerException in Java ) and the map values are organized in an array style it would look like:

float** values; // dynamic array of float pointers

my argument was that because all that final/const stuff it could be replaced with

float* values; // dynamic array of floats

Unfortunatly I already got the topic for my my M.Sc Thesis, but a sample app without a parser and simplified expressions, left expressions and literals all annotated with the modifiers (final, public, ..) should be enough to illustrate. Maybe I got time after my final exams on saturday.(Ugh, that reminds I have to learn instead of having nice discusssions )

I got my error: Wrapper classes can be null, which prohibits replacing float** to float*. Some construct 'NotNull', like available in C# 2.0, would be needed to allow optimization under all circumstances. A solution would be Hashmap<String,NotNull<Float>>. (no flaiming please, since IMHO only providing anonumous methos instead of classes prohibits clean code - needless to say that the java implmenation is much more sophisticated since the accessed local variables have to be declared final, which make it more robust)

Summing up, although in the test situation I posted (frist code) and the math class for Jeff, this problem (null renderences) cannot occur since only float arrays are used and the wrapper class conversion occur only to allow generic interfaces. On the other hand with null beeing possible, this makes the optimization a special case, which probably is hard to implement by a JIT.

That's my point, going to a profilers output for everything comsumes much more time than performing a simple mricobenchmark. So if the later can't be used, production time is rising.

False economy. You will waste far more time running and analyzing microbenchmarks on what turn out to be non-issues in your real code then firing up a profiler and profiling your app. Read Abrash's Zen of Optimization. One of my favorite quotes of his...

"Premature optimization is the root of all evil."

Maybe you havent tried a Java profiler, btw. Once ypou have the Netbeans profiler installed running it is as Ieasy as clicking the button next to the run button in the IDE. Running any of the other professional profilers (eg OptimizeIt) isnt any harder.

Quote

Furthermore isn't it possible that code could be efficient in real scenario A but worse in real scenario B ?

Which, to reiterate, is why you profile *your* app, so you know what matters and why in your use case.

Got a question about Java and game programming? Just new to the Java Game Development Community? Try my FAQ. Its likely you'll learn something!

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org