It's nice to realize that that code is so slow When you remove it the speed of the benchmark is increased by 50 times Also, the GC doesn't have to collect anything anymore. But object pooling is still slower.

Yes indeed. However I'm not the one claiming anything is free. The only plus point I gave to pooling is (if properly set-up) it allows data-flow optimizations. In general which is 'better' is: it depends.

It's nice to realize that that code is so slow When you remove it the speed of the benchmark is increased by 50 times Also, the GC doesn't have to collect anything anymore. But object pooling is still slower.

I find it hard to believe that you could write an object pool that is slower than doing a new.Also, the GC will eventually have to collect when you're doing new, and it is the unpredictable nature of "eventually" and the duration of said collection that pooling solves.

I find it hard to believe that you could write an object pool that is slower than doing a new.

I don't! It depends what changes (if any) you need to make to the object to make it suitable for pooling. Some objects that might have been immutable now have to be mutable, which could add cost (and if you're working with threads at all, then immutable objects can be a very good thing).

I personally tend to think of pooling for memory purposes rather than objects (ie. int[] / buffers for pixel data). However, in that scenario you either end up with using lots of extra memory, or a need to check dimensions - at some point the size you want is probably so small it's faster to create from scratch.

Also, the GC will eventually have to collect when you're doing new, and it is the unpredictable nature of "eventually" and the duration of said collection that pooling solves.

Also, the GC is collecting the objects (as you see when I post the output in the console). One thing that I have to agree is the unpredictable nature when you don't have object pooling. With object pooling the pause are usually between 60 and 80 ms for my example. Without object pooling, the pause can be as low as 2ms or as high as 24ms (12 times bigger than lowest). But still the pauses are much shorter.

5. Available CollectorsThe discussion to this point has been about the serial collector. The Java HotSpot VM includes three different collectors, each with different performance characteristics.

1. The serial collector uses a single thread to perform all garbage collection work, which makes it relatively efficient since there is no communication overhead between threads. It is best-suited to single processor machines, since it cannot take advantage of multiprocessor hardware, although it can be useful on multiprocessors for applications with small data sets (up to approximately 100MB). The serial collector is selected by default on certain hardware and operating system configurations, or can be explicitly enabled with the option -XX:+UseSerialGC.

2. The parallel collector (also known as the throughput collector) performs minor collections in parallel, which can significantly reduce garbage collection overhead. It is intended for applications with medium- to large-sized data sets that are run on multiprocessor or multi-threaded hardware. The parallel collector is selected by default on certain hardware and operating system configurations, or can be explicitly enabled with the option -XX:+UseParallelGC.

New: parallel compaction is a feature introduced in J2SE 5.0 update 6 and enhanced in Java SE 6 that allows the parallel collector to perform major collections in parallel. Without parallel compaction, major collections are performed using a single thread, which can significantly limit scalability. Parallel compaction is enabled by adding the option -XX:+UseParallelOldGC to the command line.

3. The concurrent collector performs most of its work concurrently (i.e., while the application is still running) to keep garbage collection pauses short. It is designed for applications with medium- to large-sized data sets for which response time is more important than overall throughput, since the techniques used to minimize pauses can reduce application performance. The concurrent collector is enabled with the option -XX:+UseConcMarkSweepGC.

The Garbage-First (G1) garbage collector is fully supported in Oracle JDK 7 update 4 and later releases. The G1 collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with high probability, while achieving high throughput. Whole-heap operations, such as global marking, are performed concurrently with the application threads. This prevents interruptions proportional to heap or live-data size.

The JIT compiler can perform additional optimizations that can reduce the cost of object allocation to zero. Consider the code in Listing 2, where the getPosition() method creates a temporary object to hold the coordinates of a point, and the calling method uses the Point object briefly and then discards it. The JIT will likely inline the call to getPosition() and, using a technique called escape analysis, can recognize that no reference to the Point object leaves the doSomething() method. Knowing this, the JIT can then allocate the object on the stack instead of the heap or, even better, optimize the allocation away completely and simply hoist the fields of the Point into registers. While the current Sun JVMs do not yet perform this optimization, future JVMs probably will. The fact that allocation can get even cheaper in the future, with no changes to your code, is just one more reason not to compromise the correctness or maintainability of your program for the sake of avoiding a few extra allocations.

Though interestingly watching the GC and heaps in jvisualvm on various games and that which I've got here I don't ever see any evidence of escape analysis actually doing anything. Heap still fills at same rate.

I've never been motivated enough to test it...too much work. But even if worked ideally (note this is why contracts rule: @NoReference would be awesome)..it still only useful in a sub-set of cases (notable tuples).

Escape analysis might have eliminated the need for GC of your actual object (though not the strings) in the sans-pool approach (messing up the results completely). To really compare them you would want to create a benchmark that better reflects the object link structures you will have in a game situation.

The basic essence of how GC works is by testing reachability from root nodes. Root nodes in this case are object references declared in currently executing functions and static class variables. The state of all of the object pointers needs to be frozen when this process is carried out, hence the use of stop the world GC's. Now that is costly, because increasing the size and complexity of object structures increases the cost of carrying out a GC mark and sweep.

Now pooling wouldn't affect GC that much in that case, but it would eliminate the memory management cost. Of course as you've seen implementations and technologies are a lot different: - there are incremental garbage collectors that don't need to stop the world and can operate in a separate thread, increasing the time it spends GC'ing based the behaviour of the program - there are generational garbage collectors that in a sense partition the structure, traversing some partitions more often than others - reducing the cost of GC'ing significantly since it traverses much less nodes and the nodes-traversed/nodes-deleted ratio is much lower (and therefore more efficient) - we now have fancy algorithms that JIT compilers use to pretty much rewrite your code, sometimes eliminating your `new` call completely (exhibiting pool like behaviour, depending on how you look at it)

Either way we've not yet reached a point where we don't need to manage memory or objects just yet, especially for real time applications where predictability* is crucial. So it's not to plain cut and dry as to whether pooling will or will not give you a boost. As your structures become more complex you will be able to see things in your system that G1 will not be privy to. And most important of all, trust the real world results more than theory.

In fact I once had a funny story of using floats on an embedded system. According to all available documentation on a certain 100% integer based processor a co-worker of mine insisted on using floating points rather than integers. He was not aware of or bothered with the technical workings of the device so ignored my warnings, so to show him the penalty and risks associated with using floats I made some benchmarks performing various arithmetic on random numbers. To my surprise the benchmarks showed the floating point operations to be faster than the integer operations in most (if not all) cases. I was baffled. I shared this information with other developers and they could not understand it, as it should just be impossible. There was no fault with the code and others could repeat the behaviour. Well, the moral here is that sometimes all theories and knowledge are trumped, so don't be too presumptuous. I might add (for reference), that we were using a fast floating point library with associated cache risks.

So because of the documentation, I'd hazard a guess that none of the developers on that system ever used floating points in their physics code, or even tested it. Dogma is a dangerous thing, so I'd say it would be good to listen to the real world results, or better yet, keep working on improving on your benchmark and perhaps you will be able to share with us some interesting referencial results.

*: I understand G1 has high predictability for suspending it's GC thread, still doesn't change the fact that there are associated GC costs and allocation costs that can be reduced

- we now have fancy algorithms that JIT compilers use to pretty much rewrite your code, sometimes eliminating your `new` call completely (exhibiting pool like behaviour, depending on how you look at it)

Aside from stack allocation, which is a different thing entirely, I've only ever seen this behavior with String constants and small values of Integer or other boxed types. I really don't believe this behavior is even possible with most objects: Java isn't referentially transparent, and even if there were a Sufficiently Smart Compiler™ that could infer it by way of noticing an aliased object is never mutated, such a compiler couldn't support separate compilation, which is a cornerstone of Java.

- we now have fancy algorithms that JIT compilers use to pretty much rewrite your code, sometimes eliminating your `new` call completely (exhibiting pool like behaviour, depending on how you look at it)

Aside from stack allocation, which is a different thing entirely, I've only ever seen this behavior with String constants and small values of Integer or other boxed types. I really don't believe this behavior is even possible with most objects: Java isn't referentially transparent, and even if there were a Sufficiently Smart Compiler™ that could infer it by way of noticing an aliased object is never mutated, such a compiler couldn't support separate compilation, which is a cornerstone of Java.

I was alluding to Escape Analysis there. Google's V8 is an excellent example of what can be achieved with this technology. In fact they've taken it a step further and even get a fair amount of code compiled directly into integer operations (despite Javascript's weak typing).

Main point though, is just to draw attention to what's going on and highlighting the dangers of dogmatically ignoring pooling and to create a benchmark much more accurate to the case.

Excelsior JET does a very efficient job of escape analysis, replacing heap allocations with stack allocations. The problem is that JDK7 doesn't seem to be doing this yet, or if it is, it doesn't seem to have any noticeable effect on garbage that I've seen in some game code. Stefan's blog entry is pretty interesting though - it's clearly doing it there. I wonder if you increase the inlining size of the compiler whether it'd be more effective. I might try that on Project Zomboid later.

I have an OpenGL based game which creates tonnes of particles and bullets without any pooling and the worst GC pause I ever saw was about 1/100th second, with most pauses in the 1/1000th - 1/10000th second range. That's using -xincgc.

It's important to note that all reported "findings" should be taken with a large dash of salt. And even when the findings are not suspect they are only useful in the context of which they were written. Most java writings will be in terms of application & server programming which has drastically different needs from computationally expensive (such as games or scientific) programming and soft-real time (such as games), etc.

Also we seem to be mixing up terms (unless hotspot is using non-standard terminology)..in terms of java:

escape analysis: determine if a reference cannot escape its creating call frame. If so then it can be stack allocated and some other optimization can be performed.scalar replacement: determine if the object itself may be broken apart into scalar components. At the extreme the object itself can be removed and its fields live on the stack and/or in registers.

@kaffiene: I'm not sure what you're saying here: 1/100 of a second is terrible.

Indeed, 10ms is awful, we'd be looking at juddering frames with that sort of pause. The maximum we want to spend on GC in a frame is maybe 3ms on a 1.6GHz-class single core sort of system. This is one area that the G1 GC excels at: when you give it a target collection time parameter, it's really pretty good at achieving that, which means with a bit pooling and careful coding not to generate too much crap in a frame, we're guaranteed a more or less rock steady 60Hz even on low-end systems. Awesome.

The sweetspot of trouble is having (hundreds of) millions objects that mostly need to be retained, while also generating many objects of which a small percentage makes it out of eden. The 'full gc' that eventually occurs will have to move around most data in the different heaps and rewrite all pointers of the moved data. This is relatively slow.

Haha pedant One more observation about games and GC: irregular, sometime even long, GC pauses are acceptable, even in realtime arcade games. What is not acceptable is regular GC pauses, as they seriously ruin the experience as they break the brain's ingenious ability to correctly predict motion.

Also of course games that don't rely on constant realtime action... who cares about GC

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org