Hi Peter,
Interesting email. I think it is a thoughtful contribution and these are great responses to concerns and questions. I hope it receives the due consideration it deserves.
Kind regards,
Kirk
On May 31, 2015, at 9:32 PM, Peter Levart <peter.levart at gmail.com> wrote:
> Hi,
>> Thanks for views and opinions. I'll try to confront them in-line...
>> On 05/29/2015 04:18 AM, David Holmes wrote:
>> Hi Peter,
>>>> I guess I'm very concerned about the premise that finalization should scale to millions of objects and be performed highly concurrently. To me that's sending the wrong message about finalization. It also isn't the most effective use of cpu resources - most people would want to do useful work on most cpu's most of the time.
>>>> Cheers,
>> David
>> @David
>> Ok, fair enough. It shouldn't be necessary to scale finalization to millions of objects and be performed concurrently. Normal programs don't need this. But there is a diagnostic command being developed at this moment that displays the finalization queue. The utility of such command, as I understand, is precisely to display when the finalization thread can not cope and Finalizer(s) accumulate. So there must be that some hypothetical programs (ab)use finalization or are buggy (deadlock) so that the queue builds up. To diagnose this, a diagnostic command is helpful. To fix it, one has to fix the code. But what if the problem is not that much about the allocation/death rate of finalizable instances then it is about the heavy code of finalize() methods of those instances. I agree that such programs have a smell and should be rewritten to not use finalization but other means of cleanup such as multiple threads removing WeakReferences from the queue for example or something completely different and not based on Reference(s). But wouldn't it be nice if one could simply set a system property for the max. number of threads processing Finalizer(s)?
>> I have prepared an improved variant of the prototype that employs a single ReferenceHandler thread and adds a ForkJoinPool that by default has a single worker thread which replaces the single finalization thread. So by default, no more threads are used than currently. If one wants (s)he can increase the concurrency of finalization with a system property.
>> I have also improved the benchmarks that now focus on CPU overhead when processing references at more typical rates, rather than maximum throughput. They show that all changes taken together practically half the CPU time overhead of the finalization processing. So freed CPU time can be used for more useful work. I have also benchmarked the typical asynchronous WeakReference processing scenario where one thread removes enqueued WeakReferences from the queue. Results show about 25% decrease of CPU time overhead.
>> Why does the prototype reduce more overhead for finalization than WeakReference processing? The main improvement in the change is the use of multiple doubly-linked lists for registration of Finalizer(s) and the use of lock-less algorithm for the lists. The WeakReference processing benchmark also uses such lists internally to handle registration/deregistration of WeakReferences so that the impact of this part is minimal and the difference of processing overheads between original and changed JDK code more obvious. (De)registration of Finalizer(s) OTOH is part of JDK infrastructure, so the improvement to registration list(s) also shows in the results. The results of WeakReferece processing benchmark also indicate that reverting to the use of a single finalization thread that just removes Finalizer(s) from the ReferenceQueue could lower the overhead even a bit further, but then it would not be possible to leverage FJ pool to simply configure the parallelism of finalization. If parallel processing of Finalizer(s) is an undesirable feature, I could restore the single finalization thread and the CPU overhead of finalization would be reduced to about 40% of current overhead with just the changes to data structures.
>> So, for the curious, here's the improved prototype:
>>http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/webrev.02/>> And here are the improved benchmarks (with some results inline):
>>http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/refproc/>>> The benchmark results in the ThroughputBench.java show the output of the test(s) when run with the Linux "time" command which shows the elapsed real time and the consumed user and system CPU times. I think this is relevant for measuring CPU overhead.
>> So my question is: Is it or is it not desirable to have a configurable means to parallelize the finalization processing? The reduction of CPU overhead of infrastructure code should always be desirable, right?
>> On 05/29/2015 05:57 AM, Kirk Pepperdine wrote:
>> Hi Peter,
>>>> It is a very interesting proposal but to further David’s comments, the life-cycle costs of reference objects is horrendous of which the actual process of finalizing an object is only a fraction of that total cost. Unfortunately your micro-benchmark only focuses on one aspect of that cost. In other words, it isn’t very representative of a real concern. In the real world the finalizer *must compete with mutator threads and since F-J is an “all threads on deck” implementation, it doesn’t play well with others. It creates a “tragedy of the commons”. That is situations where everyone behaves rationally with a common resource but to the detriment of the whole group”. In short, parallelizing (F-Jing) *everything* in an application is simply not a good idea. We do not live in an infinite compute environment which means to have to consider the impact of our actions to the entire group.
>> @Kirk
>> I changed the prototype to only use a single FJ thread by default (configurable with a system property). Lowering the CPU overhead of finalizer processing for 50% is also an improvement. I'm still keeping finalization FJ-pool for now because it is more scaleable and has less overhead than a solution with multiple threads removing references from the same ReferenceQueue. This happens when the FJ-pool is configured with > 1 parallelism or when user code calls Runtime.runFinalization() that translates to ForkJoinPool.awaitQuiescence() which lends the calling thread to help the poll execute the tasks.
>>> This was one of the points of my recent article in Java Magazine which I wrote to try to counter some of the rhetoric I was hearing in conference about the universal benefits of being able easily parallelize streams in Java 8. Yes, I agree it’s a great feature but it must be used with discretion. Case in point. After I finished writing the article, I started running into a couple of early adopters that had swallowed the parallel message whole indiscriminately parallelizing all of their streams. As you can imagine, they were quite surprised by the results and quickly worked to de-parallelize *all* of the streams in the application.
>>>> To add some ability to parallelize the handling of reference objects seems like a good idea if you are collecting large numbers of reference objects (>10,000 per GC cycle). However if you are collecting large numbers of reference objects you’re most likely doing something else wrong. IME, finalization is extremely useful but really only for a limited number of use cases and none of them (to date) have resulted in the app burning through 1000s of final objects / sec.
>>>> It would be interesting to know why why you picked on this particular issue.
>> Well, JEP-132 was filed by Oracle, so I thought I'll try to tackle some of it's goals. I think I at least showed that the VM part of reference handling is mostly not the performance problem (if there is a problem at all), but the Java side could be modernized a bit.
>>> Kind regards,
>> Kirk
>> On 05/29/2015 07:20 PM, Rezaei, Mohammad A. wrote:
>> For what it's worth, I fully agree with David and Kirk around finalization not necessarily needing this treatment.
>>>> However, I was hoping this would have the effect of improving (non-finalizable) reference handling. We've seen serious issues in WeakReference handling and have had to write some twisted code to deal with this.
>> @Moh
>> Can you elaborate some more on what twists were necessary or what problems you had?
>>> So I guess the question I have to Kirk and David is: do you feel a GC load of 10K WeakReferences per cycle is also "doing something else wrong"?
>> If there is an elegant way to achieve your goal without using WeakReferences then it might be better to not use them. But it is also true that WeakReferences frequently lend an elegant way to solve a problem. The same goes with finalization which is sometimes even more elegant.
>>> Sorry if this is going off-topic.
>> You're spot on topic and thanks for your comment.
>>> Thanks
>> Moh
>>>>>>> Regards, Peter
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20150601/8379e3d7/attachment.html>