Hi Martin,
thanks for the hints, please see comments inline.
On 25.08.2015 17:01, Doerr, Martin wrote:
> thanks for your quick response. Unfortunately, we neither have a good reproduction case nor a regression test which is actually the reason why we did not post this earlier.
> We had observed very sporadic assertions or freeing of unallocated memory.
Okay, same here.
> Basically, I believe that cleaning the inline caches before transitioning from unloaded to zombie is the right thing. However, there's still the problem that it's hard to test.
>> Additionally, it may be required to adapt a couple of assertions.
> We modified the following ASSERT code (based on hotspot 25)
> 1. in CompiledIC::is_call_to_compiled():
> Accessing cached_metadata() may be unsafe if !caller->is_alive().
I assume this could be a problem in the following case:
state of A state of B
-------------------------------
not-entrant
S [not-on-stack]
S zombie
unloaded
Now the IC of A still references the unloaded nmethod B and is_call_to_compiled() will access the unloaded metadata. Right?
I added an assert to cached_metadata() to make sure we don't access metadata that belongs to unloaded nmethods.
> 2. in CompiledIC::is_call_to_interpreted():
> CodeBlob* db may be gone if cb is unloaded.
How can that happen?
I think if db is really flushed, it may as well be replaced by another allocation (for example, an adapter blob) and then your if(!db) check returns false and still triggers the assert(!db->is_adapter_blob(), ...).
> 3. in CompiledIC::verify():
> is_call_to_compiled() has crashed. Seems to be unsafe in megamorphic case so we changed the order of the checks.
I don't see why is_call_to_compiled() is unsafe in the megamorphic case. Could you explain that?
> 4. in nmethod::verify_clean_inline_caches():
> In case of relocInfo::virtual_call_type CompiledIC may still point to a zombie method.
How can that be? After method unloading all ICs of live nmethods were cleaned in CodeCache::gc_epilogue(). Transitions to zombie in the sweeper also lead to IC cleaning of other nmethods.
Thanks,
Tobias
> For details, please see below.
>> Best regards,
> Martin
>>>> 1.
> assert( is_c1_method ||
> !is_monomorphic ||
> is_optimized() ||
> !caller->is_alive() ||
> (cached_metadata() != NULL && cached_metadata()->is_klass()), "sanity check");
>> 2.
> #ifdef ASSERT
> {
> CodeBlob* db = CodeCache::find_blob_unsafe(dest);
> if (!db) {
> nmethod *nm = cb->as_nmethod_or_null();
> assert(nm, "sanity");
> if ( nm->is_in_use() ||
> (nm->is_not_entrant() && (!SafepointSynchronize::is_at_safepoint() || !nm->is_marked_for_deoptimization())) ) {
> { // Dump some information.
> ttyLocker ttyl;
> tty->print_cr("ERROR: Did not find codeblob for destination %p", dest);
> nm->print(tty);
> Method *m = nm->method();
> if (m) {
> m->print_on(tty);
> }
> }
> assert(false, err_msg("nmethod is in state %d but destination blob is gone", (int)(nm->state())));
> }
> } else
> assert(!db->is_adapter_blob(), "must use stub!");
> }
> #endif /* ASSERT */
>> 3.
> assert(is_clean()
> || is_optimized() || is_megamorphic()
> || is_call_to_compiled() || is_call_to_interpreted()
> , "sanity check");
>> 4.
> case relocInfo::virtual_call_type: {
> CompiledIC *ic = CompiledIC_at(&iter);
> // Ok, to lookup references to zombies here
> CodeBlob *cb = CodeCache::find_blob_unsafe(ic->ic_destination());
> if( cb != NULL && cb->is_nmethod() ) {
> nmethod* nm = (nmethod*)cb;
> // Verify that inline caches pointing to not_entrant methods are clean
> if (nm->is_not_entrant()) {
> assert(ic->is_clean(), "IC should be clean");
> }
> }
> break;
> }
> case relocInfo::opt_virtual_call_type: {
>>> -----Original Message-----
> From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com]
> Sent: Dienstag, 25. August 2015 16:34
> To: Doerr, Martin; Igor Veresov
> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net> Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder
>> I missed that we have to be careful when cleaning ICs of a zombie nmethod to not create transition stubs because backpatching of those is unnecessary and fails with "unsafe access to zombie method". I changed the code to not create a transition stub if the corresponding nmethod is dead and updated the webrev in place.
>> Best,
> Tobias
>> On 25.08.2015 14:52, Tobias Hartmann wrote:
>> Hi Martin,
>>>> thanks for looking at this!
>>>> It's interesting that you encountered and fixed the same problem. Were you able to create a regression test?
>>>> I did not evaluate the impact on the safepoint duration but you are right that it may be affected. I talked to Mikael Gerdin from the GC team and he told me that they had issues before because the scanning of nmethods and their relocations took a significant amount of time.
>>>> I would therefore like to go with the alternative solution in the case of unloaded nmethods, i.e., cleaning their ICs in the sweeper. Unfortunately, I already pushed the fix so here is the incremental webrev:
>>http://cr.openjdk.java.net/~thartmann/8075805/webrev.02/>>>> I left the fix for the nmethods that are marked-for-deoptimization as it is, because it avoids cleaning the IC's of all zombie nmethods and simplifies the possible state transitions.
>>>> If you guys are fine with the change, I open a new bug/enhancement and send out a separate RFR.
>>>> Thanks,
>> Tobias
>>>> On 25.08.2015 12:34, Doerr, Martin wrote:
>>> Hi all,
>>>>>> we appreciate that this code gets cleaned up.
>>>>>> Iterating over all nmethods in gc_epilogue should fix the problem. Did anybody check the impact on the safepoint duration?
>>> We have also fixed this problem. We use the other approach and added following code to nmethod::make_not_entrant_or_zombie:
>>>>>> if (state == zombie) {
>>> MutexLockerEx ml(SafepointSynchronize::is_at_safepoint() ? NULL : CompiledIC_lock);
>>> address low_boundary = verified_entry_point () + NativeJump::instruction_size; // See cleanup_inline_caches.
>>> RelocIterator iter(this, low_boundary);
>>> while (iter.next()) {
>>> if (iter.type() == relocInfo::virtual_call_type) {
>>> CompiledIC *ic = CompiledIC_at(&iter);
>>> ic->set_ic_destination_and_value(SharedRuntime::get_resolve_virtual_call_stub(), (Metadata*)NULL);
>>> }
>>> }
>>> }
>>>>>> (Note: set_ic_destination_and_value is currently private.)
>>>>>> As discussed in earlier emails, this also fixes the problem. An advantage is that this approach does the job in a concurrent phase without impacting the safepoint duration.
>>> Not sure which approach is the better one.
>>>>>> Best regards,
>>> Martin
>>>>>>>>> -----Original Message-----
>>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Tobias Hartmann
>>> Sent: Dienstag, 25. August 2015 07:43
>>> To: Igor Veresov
>>> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net>>> Subject: Re: [9] RFR(S): 8075805: Crash while trying to release CompiledICHolder
>>>>>> On 24.08.2015 22:10, Igor Veresov wrote:
>>>> Seems good to me.
>>>>>> Thanks, Igor.
>>>>>>> Btw, did you find why there is a need for “marked for reclamation” state?
>>>>>> No, I couldn't find a reason yet. I did some testing without this state and didn't run into any obvious problems. I'll file a bug and further investigate. It would be nice if we could save this transition.
>>>>>> Best,
>>> Tobias
>>>>>>>>>>> igor
>>>>>>>>> On Aug 24, 2015, at 12:58 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
>>>>>>>>>> Thanks, Vladimir! Please see comments inline.
>>>>>>>>>> On 21.08.2015 19:28, Vladimir Kozlov wrote:
>>>>>> During our discussion about this problem we thought that we may need additional call nm->cleanup_inline_caches() by sweeper when we convert not_entrant to zombie to prevent zombie pointing to an other nmethods. You think we don't need it?
>>>>>>>>>> I looked at all possible nmethod transitions and came to the conclusion that the problem is only possible if there is a direct transition from non-entrant to zombie without a sweeper cylce in-between.
>>>>>>>>>> The solution we discussed, i.e., always cleaning ICs for the non-entrant -> zombie transition, would fix the problem as well but would be more invasive than the proposed solution because it affects all nmethods.
>>>>>>>>>>> Please, clarify changes in states transition of unloaded nmethods - "The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie."
>>>>>> I don't see how changing make_marked_nmethods_zombies() call to make_marked_nmethods_not_entrant() affects unloaded nmethods. Both make_*() methods iterates only over is_alive() nmethods (iter.next_alive()), so they skip unloaded.
>>>>>>>>>> Yes, my description is wrong. It should be "The only impact is that _marked_ nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie." The change does not affect the state transitions of unloaded nmethods.
>>>>>>>>>> In other words, the change removes the shortcut that allowed nmethods that were marked for deoptimization to be converted to zombie at a safepoint if they were already non-entrant and not on the stack before. These nmethods now need an additional sweeper cycle to be converted to zombie.
>>>>>>>>>>> What spacing you changed in compiledIC.cpp because webrev does not show them?
>>>>>>>>>> I fixed the wrong indentation in CompiledIC::set_to_clean(). You can see it in the webrev if you click on "Patch":
>>>>>http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/src/share/vm/code/compiledIC.cpp.patch>>>>>>>>>>> Please, fix comment in vm_operations.cpp
>>>>>>>>>>>> // Make the dependent methods zombies
>>>>>> - CodeCache::make_marked_nmethods_zombies();
>>>>>> + CodeCache::make_marked_nmethods_not_entrant();
>>>>>>>>>> Fixed.
>>>>>>>>>> New webrev:
>>>>>http://cr.openjdk.java.net/~thartmann/8075805/webrev.01/>>>>>>>>>> Thanks,
>>>>> Tobias
>>>>>>>>>>>>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>>>>>>> On 8/21/15 7:36 AM, Tobias Hartmann wrote:
>>>>>>> Hi,
>>>>>>>>>>>>>> please review the following patch.
>>>>>>>>>>>>>>https://bugs.openjdk.java.net/browse/JDK-8075805>>>>>>>http://cr.openjdk.java.net/~thartmann/8075805/webrev.00/>>>>>>>>>>>>>> Problem:
>>>>>>> The VM crashes at a safepoint while trying to free a CompiledICHolder object that was enqueued for release after flushing a nmethod. The crash happens because the object is not a CompiledICHolder but Metadata which should not be removed. The problem is that at nmethod release, "CompiledIC::is_icholder_entry" is used to determine if the ICs of that nmethod still reference CompiledICHolder objects and if so, those objects are enqueued for release at the next safepoint. The method returns true if the destination of the IC is a C2I adapter, assuming that in this case the IC is in the to-interpreter state and the cached value must be a CompiledICHolder object. However, there are very rare cases where the IC is actually in the to-compiled state but the destination nmethod was already flushed and replaced by another allocation. Since the IC is still pointing to the same address in the code cache, the state of the IC is confused.
>>>>>>>>>>>>>> Cleaning of inline caches that point to dead nmethods should prevent this. However, we do not clean ICs of nmethods that are converted to zombie themselves. Usually, that's okay because a zombie nmethod will be flushed before any dead nmethod it references. This is guaranteed because each nmethod goes through the states alive -> non-entrant -> zombie -> marked-for-reclamation before being flushed.
>>>>>>>>>>>>>> Suppose we have two nmethods A and B, where A references B through an IC and B is always processed first by the sweeper. The following table shows the state transitions from top to bottom where lines marked with "S" show a transition in the corresponding iteration of the sweeper.
>>>>>>>>>>>>>> state of A state of B
>>>>>>> -----------------------------------------
>>>>>>> non-entrant non-entrant
>>>>>>> S [not on stack] [not on stack]
>>>>>>> S zombie zombie
>>>>>>> S marked marked
>>>>>>> S flushed flushed/re-allocated
>>>>>>>>>>>>>> The IC of A will be cleaned in the first sweeper cycle because B is non-entrant so we don't need to clean ICs again if A is converted to zombie.
>>>>>>>>>>>>>> Let's look at the following setting:
>>>>>>>>>>>>>> state of A state of B
>>>>>>> -----------------------------------------
>>>>>>> non-entrant
>>>>>>> S [not on stack]
>>>>>>> non-entrant
>>>>>>> S zombie [not on stack]
>>>>>>> zombie
>>>>>>> S marked marked
>>>>>>> S flushed flushed/re-allocated
>>>>>>>>>>>>>> There are two problems here:
>>>>>>> - the IC of A is not cleaned because B is not yet non-entrant in the first iteration of the sweeper and afterwards A becomes zombie itself,
>>>>>>> - the transition from B to zombie happens outside the sweeper in 'CodeCache::make_marked_nmethods_zombies()' because the previous sweeper iteration already determined that the nmethod is not on the stack.
>>>>>>>>>>>>>> The VM now crashes while flushing A because it still references B. Since B was replaced by an C2I adapter, we assume that A's IC is in the to-interpreter state and try to free a CompiledICHolder object which is actually Klass-Metadata for B. The detailed logs are below [1].
>>>>>>>>>>>>>> A similar problem occurs with nmethod unloading because unloaded nmethods transition directly to zombie:
>>>>>>>>>>>>>> state of A state of B
>>>>>>> -----------------------------------------
>>>>>>> unloaded unloaded
>>>>>>> S zombie zombie
>>>>>>> S marked marked
>>>>>>> S flushed flushed/re-allocated
>>>>>>>>>>>>>> Again, we crash while flushing A.
>>>>>>>>>>>>>> Solution:
>>>>>>> I removed the 'make_marked_nmethods_zombies()' and replaced it by calls to 'make_marked_nmethods_not_entrant()'. This avoids the non-entrant -> zombie transition outside of the sweeper. The only impact is that unloaded nmethods that are already non-entrant and not on the stack need another iteration of the sweeper to become zombie. I verified that this has no impact on performance. I also removed the code that was added by JDK-8059735 because now only the sweeper can set a nmethod to zombie.
>>>>>>>>>>>>>> To fix the nmethod unloading case, I changed the implementation of CodeCache::gc_epilogue to clean ICs of unloaded nmethods as well.
>>>>>>>>>>>>>> Testing:
>>>>>>> - Executed failing tests for a week (still running)
>>>>>>> - JPRT
>>>>>>> - Performance (SPECjbb2005, SPECjbb2013, SPECjvm2008), no differences
>>>>>>>>>>>>>> Thanks,
>>>>>>> Tobias
>>>>>>>>>>>>>>>>>>>>> [1] Detailed logs for nmethod A (1178) and nmethod B (552):
>>>>>>>>>>>>>> Inline cache at 0xffff80ffad89b017, calling 0xffff80ffad1063c0 cached_value 0x0000000000000000 changing destination to 0xffff80ffad66ae20 changing cached metadata to 0x0000000800034a18
>>>>>>>IC at 0xffff80ffad89b017: monomorphic to compiled (rcvr klass) 'java/util/concurrent/ConcurrentHashMap':
>>>>>>> ### IC at 0xffff80ffad89b017: set to Nmethod 552/
>>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (not entrant) being made zombie
>>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (not entrant) being made zombie from make_marked_nmethods_zombies()
>>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (zombie) being marked for reclamation
>>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (zombie) being marked for reclamation
>>>>>>> ### Nmethod 552/0xffff80ffad66ac10 (marked for reclamation) being flushed
>>>>>>> *flushing nmethod 552/0xffff80ffad66ac10. Live blobs:2325/Free CodeCache:235986Kb
>>>>>>> ### I2C/C2I adapter 0xffff80ffad66ac10 allocated
>>>>>>> ### Nmethod 1178/0xffff80ffad89ac50 (marked for reclamation) being flushed
>>>>>>> cleanup_call_site 0x0000000800034a18 to be freed, destination 0xffff80ffad66ae20 inline cache at 0xffff80ffad89b017
>>>>>>> enqueueing icholder 0x0000000800034a18 to be freed
>>>>>>> *flushing nmethod 1178/0xffff80ffad89ac50. Live blobs:2346/Free CodeCache:235955Kb
>>>>>>> deleting icholder 0x0000000800034a18
>>>>>>> ## nof_mallocs = 211209, nof_frees = 105760
>>>>>>> ## memory stomp:
>>>>>>> GuardedMemory(0xffff80ff623ca180) base_addr=0x00000008000349f8 tag=0x0000000800034a18 user_size=18446604433140746280 user_data=0x0000000800034a18
>>>>>>> Header guard @0x00000008000349f8 is BROKEN
>>>>>>>>>>>