As far as I know it would be possible with escape analysis based optimizations to do at least 2 closed-world optimizations for the price of a very high analysis overhead (=expensive):1.) Lock elimination (Especially on multiprocessor servers interresting)2.) Stack allocation

Will escape analysis based optimizations be part of Mustang, because as far as I see no work is done at all on this task although several groups argue that this could be helpful (an enginieer stated that mustang has support for lock-widening but not lock-elimination because that would require escape analysis, gc engineer stated that some where in java's future short lived objects may be even faster but ...).Or will we have to wait till Dolphin or even longer?

"Escape analysis is an optimization that has been talked about for a long time, and it is finally here -- the current builds of Mustang (Java SE 6) can do escape analysis and convert heap allocation to stack allocation (or no allocation) where appropriate. The use of escape analysis to eliminate some allocations results in even faster average allocation times, reduced memory footprint, and fewer cache misses. Further, optimizing away some allocations reduces pressure on the garbage collector and allows collection to run less often."

Well the ibm-article is definitivly wrong since I already knew it and I also know that escape-analysis based stack allocation is not enabled in current mustang builds. It is/was(?) planned for Mustang thats the reason I guess why the autor mentioned mustang.

"I have read last column of Java theory and practice: "Urban performance legends, revisited".I have been playing with all recent builds of mustang (java 6), but there is no evidence that escape analysis is implemented or enabled.

Best RegardsARM"

And Brian Goetz replies

"What evidence would you expect to find? The operation is invisible to the program. Also, optimizations like this are dynamic and may not kick in until a program runs for a long time.Also, it is often common that advanced optimizations are not all "turned on" by default for pre-release JVMs."

I wonder from where this guy knows wether escape analysis is already integrated in Mustang - since he's even not a sun engineer.However mustang code is free (for somebody who does not plan to work on a free jvm project) so it should be possible to find out wether its already implemented.

"What evidence would you expect to find? The operation is invisible to the program.

Well, there are jvm analysis tools and if mustang would do stack-allocation (which it currently doesn't) the heap would not grow at all and there would be at least a 25-??% performance gain for tons of small allocations.

Quote

Also, it is often common that advanced optimizations are not all "turned on" by default for pre-release JVMs.

This is really my hope *please*However they want to test it, don't they.

it seems that there is no difference in generated code - but EscapeAnalysis correctly detects that Integer allocation is local. Most probably you are right - they have implemented analysis, but part which performs stack allocation is missing.

I wonder if monitors synchronization is skipped for method local objects - will have to investigate

Hats off to Hotspot team. I'm playing with various options, looking at compiler output. Just noticed that String.hashCode has it's loop unrolled (four steps). While it is not most complicated method in the world, fact that Hotspot unrolls loops for you is a nice thing. And to be honest, it unrolls it in good way - fetch all data from memory at one place and then performs 4 steps on registers only (as opposed to more obvious fetch/compute/fetch/compute/etc).

No, you use object pools for precisely the other way round - they are for objects that are expensive to construct.

Since you can count collection-time to construction time this is quite a valid comparison.Imagine operations like creating some kind of immutable objects in a loop, like Integers.Very cheap to construct and almost no way round if the API forces you this way to go - but the masses of small objects count.

This is probably a very newbie question but just how long is it before my short object makes it out of eden space? And just how long is short. Should i stop using obj pools for things like particles where say 20-40 might be made each frame and last say 200-1500 ms?

Pooling on short lived, easily created objects has been a net loss for quite awhile in hotspot.

Im going to have to disagree with you there Jeff...

We were doing a Doom3 level loading in our game engine and decided to go with the route of adding events to signal addition of objects to a render bin, we noticed an FPS drop by 10fps from 40 (without events) on good hardware to 30. Pooling the objects and using some clever tricks bought the fps back up to 40...

This is probably a very newbie question but just how long is it before my short object makes it out of eden space? And just how long is short. Should i stop using obj pools for things like particles where say 20-40 might be made each frame and last say 200-1500 ms?

Assuming I have a basic understanding of this from watching the cool jconsole tools and stuff... It works approximately like this :

Your object will sit in the eden space until the eden space is full, then only objects that are still alive are copied out of the eden space into the survivor space and the free memory pointer is reset to the beginning of the eden space. Big objects that don't fit in the eden space are automatically allocated in the older generation space. The survivor space may not actually exist (i.e. it could be part of the eden space to begin with), but conceptually it is simply there to delay promotion into the older generation for objects that were only recently allocated before eden filled. When an object survives N collections of the eden space it is promoted. Therefore the older generation tends to fill much slower, possibly it never fills and never needs collecting because only a few objects make it there and they are the ones that are alive forever as far as your application is concerned (most likely a few objects 'die' in the older generation but a collection is never needed so they just sit there until your program ends).

That's actually a gross oversimplification... there are all sorts of other ratios and rules to tune the GC and various things depend on the GC algorithm you choose to use, and I certainly don't know all the details.

The KEY is to use tools to view the memory profile and TUNE things to fit the characteristics of your application. For example the size of the young generation and the old generation will affect collection times and how quickly objects are promoted. you have to strike the most optimal balance you can. The latest GC stuff in HotSpot will actually self-tune to a degree... you just have to tell it what your target collection times are and it can adjust various ratios on it's own in an attempt to meet that time. This may mean that collections occur much more frequently, but they take far less time on each run... e.g. if you collect every other frame, but the collection only takes 1ms then your game won't likely be effected.. but if you collect every 200 frames and the collection takes 100ms.. you just got an ugly bump in your frame rate.

I have been trying a few different approach to avoid the garbage collection making the fps of my engine unsteady ;- Limit the amount of created garbage : this can be thought of a good solution but in fact it makes me design my API in an unnatural way . I think this can be kept in mind when writing the code but should not be something that leads to design choice.- Use object pools : at first, I thought it was a good solution but in the end I was wrong and removed them ; . first, object pools degrade the quality of the API of my engine (this is very important for me to keep things simple an readable), . object pool can be slow since they need some sort of explicit garbage collection like reference counting. (For example, with reference counting, a render frame that consists of 3000 of render commands which each holds 1 render state which holds 6 matrices leads you to decrement the references on 1 + 3000 + 3000 x 1 + 3000 x 1 x 6 reference counted object when you collect the render frame !). . object pool have a very bad impact on the memory requirements of the application (you need your object pools to be typed therefore you end up with lots of object pools which do not share there free memory area) . object pool increase the work needed to maintain your code since they are a source of memory leak.- Tune the garbage collector : it did just work. I have an average of 1 ms or less per frame for garbage collection which I consider very low (memory management has to have a cost) and far lower than that was costing me object pools.

Object pools design can be an interesting option for object for which the cost of creation or destruction is high.I'm still wondering if this is the case with direct buffer (I have some test where it seems that they are the cause of very long 'other full gc', but I have not finished profiling this).

There is a good online chapter about GC tuning from Killer Game Programming reference somewhere in this forum.

To return on the main subject, I can see one big interest of escape analysis ; it is the use of the new iteration contruct in Java 5.I used it everywhere but it was creating lots of iterators and I therefore moved back to the simple ArrayList iteration. It is not satisfying since it constrain my API to explicitly returns ArrayList instead of List.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org