vframeStream stack traversal can assert on x86 when trying to decode an interpreter frame that is in the process of being migrated for On-Stack Replacement (OSR). This is because the interpreter frame does not have a valid bcp, it instead has an nmethod in the bcp slot (the OSR nmethod).

There is currently a save operation that uses r13 for saving the OSR nmethod over the VM call into SharedRuntime::OSR_migration_begin(). This has the side-effect of installing the OSR nmethod into the interpreter frame bcp slot.

src/cpu/x86/vm/templateTable_x86.cpp
(old) L2228: __ load_unsigned_byte(rbx, Address(rbcp, 0));
// restore target bytecode
I was a little concerned about not restoring the target
bytecode into the 'rbx' register, but when I looked at
dispatch code:

set_method_data_pointer_for_bcp() doesn't use the rbx
value that we bothered to restore.

OK, so on 64-bit the code saved the nmethod before the call to
SharedRuntime::OSR_migration_begin() and it was using 'r13'
which is the register we use for 'bcp' in 64-bit. This use of
r13/bcp was visible to stack walkers (because of save_bcp()) and
caused the assert() failure. What's the failure mode in release
bits?

This is outstanding sleuthing!

You've switched the save to use 'rbx' on both 64-bit and 32-bit
and you've removed stale code that was using 'rbx' for saving
the target bytecode unnecessarily.

Thumbs up on this change!

Please don't forget to update the copyright before you push.

Dan

P.S.
It feels like we're missing some infrastructure to prevent
the accidental use of 'r13'. We have other "special" registers
that we guard against being used... Perhaps we need something
for 'r13'.

>
>
>
> Summary:
>
>
>
> vframeStream stack traversal can assert on x86 when trying to decode an interpreter frame that is in the process of being migrated for On-Stack Replacement (OSR). This is because the interpreter frame does not have a valid bcp, it instead has an nmethod in the bcp slot (the OSR nmethod).
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # Internal Error (distro/s/hotspot/src/share/vm/runtime/vframe.cpp:472), pid=3624, tid=3640
> # assert(false) failed: invalid bci or invalid scope desc
> #
>
> There is currently a save operation that uses r13 for saving the OSR nmethod over the VM call into SharedRuntime::OSR_migration_begin(). This has the side-effect of installing the OSR nmethod into the interpreter frame bcp slot.
>
>
>
> Thanks
>
> Markus

The "failure mode" in a release build is that any unresolved bci will be unconditionally re-set to 0 (see vframeStreamCommon::fill_from_interpreter_frame()). The effect of this is that stacktraces are currently reporting erroneous bci information, especially in the context of OSR frames, which would normally have a bci > 0 if the OSR is triggered by backedge counter overflow.

Thanks also for pointing to the other bytecode restore location (@L2213).

I have updated the webrev with your inputs in light of the other redundant bytecode restore in addition to updates to the copyright header(s).

In addition, I took the opportunity to turn vframeStreamCommon::found_bad_method_frame() into a DEBUG_ONLY conditional function.

src/cpu/x86/vm/templateTable_x86.cpp
(old) L2228: __ load_unsigned_byte(rbx, Address(rbcp, 0));
// restore target bytecode
I was a little concerned about not restoring the target
bytecode into the 'rbx' register, but when I looked at
dispatch code:

set_method_data_pointer_for_bcp() doesn't use the rbx
value that we bothered to restore.

OK, so on 64-bit the code saved the nmethod before the call to
SharedRuntime::OSR_migration_begin() and it was using 'r13'
which is the register we use for 'bcp' in 64-bit. This use of r13/bcp was visible to stack walkers (because of save_bcp()) and caused the assert() failure. What's the failure mode in release bits?

This is outstanding sleuthing!

You've switched the save to use 'rbx' on both 64-bit and 32-bit and you've removed stale code that was using 'rbx' for saving the target bytecode unnecessarily.

Thumbs up on this change!

Please don't forget to update the copyright before you push.

Dan

P.S.
It feels like we're missing some infrastructure to prevent the accidental use of 'r13'. We have other "special" registers that we guard against being used... Perhaps we need something for 'r13'.

>
>
>
> Summary:
>
>
>
> vframeStream stack traversal can assert on x86 when trying to decode an interpreter frame that is in the process of being migrated for On-Stack Replacement (OSR). This is because the interpreter frame does not have a valid bcp, it instead has an nmethod in the bcp slot (the OSR nmethod).
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # Internal Error
> (distro/s/hotspot/src/share/vm/runtime/vframe.cpp:472), pid=3624,
> tid=3640 # assert(false) failed: invalid bci or invalid scope desc #
>
> There is currently a save operation that uses r13 for saving the OSR nmethod over the VM call into SharedRuntime::OSR_migration_begin(). This has the side-effect of installing the OSR nmethod into the interpreter frame bcp slot.
>
>
>
> Thanks
>
> Markus

> Thanks a lot Dan for taking a look at this.
>
> The "failure mode" in a release build is that any unresolved bci will be unconditionally re-set to 0 (see vframeStreamCommon::fill_from_interpreter_frame()). The effect of this is that stacktraces are currently reporting erroneous bci information, especially in the context of OSR frames, which would normally have a bci > 0 if the OSR is triggered by backedge counter overflow.
>
> Thanks also for pointing to the other bytecode restore location (@L2213).
>
> I have updated the webrev with your inputs in light of the other redundant bytecode restore in addition to updates to the copyright header(s).
>
> In addition, I took the opportunity to turn vframeStreamCommon::found_bad_method_frame() into a DEBUG_ONLY conditional function.
>
> Updated webrev:
>
> http://cr.openjdk.java.net/~mgronlun/8175178/webrev02/>
> Regarding:
>
> "It feels like we're missing some infrastructure to prevent the accidental use of 'r13'."

We use r13 (and rlocals) as a scratch register in lots of places due to
not having enough registers on x86. As long as there isn't a safepoint
(or call into the runtime as you found), this should be perfectly safe,
as long as there's a restore_bcp() or restore_locals() call.

> Thanks a lot Dan for taking a look at this.
>
> The "failure mode" in a release build is that any unresolved bci will be unconditionally re-set to 0 (see vframeStreamCommon::fill_from_interpreter_frame()). The effect of this is that stacktraces are currently reporting erroneous bci information, especially in the context of OSR frames, which would normally have a bci > 0 if the OSR is triggered by backedge counter overflow.
>
> Thanks also for pointing to the other bytecode restore location (@L2213).
>
> I have updated the webrev with your inputs in light of the other redundant bytecode restore in addition to updates to the copyright header(s).
>
> In addition, I took the opportunity to turn vframeStreamCommon::found_bad_method_frame() into a DEBUG_ONLY conditional function.
>
> Updated webrev:
>
> http://cr.openjdk.java.net/~mgronlun/8175178/webrev02/>
> Regarding:
>
> "It feels like we're missing some infrastructure to prevent the accidental use of 'r13'."

We use r13 (and rlocals) as a scratch register in lots of places due to not having enough registers on x86. As long as there isn't a safepoint (or call into the runtime as you found), this should be perfectly safe, as long as there's a restore_bcp() or restore_locals() call.

> Thanks a lot Dan for taking a look at this.
>
> The "failure mode" in a release build is that any unresolved bci will be unconditionally re-set to 0 (see vframeStreamCommon::fill_from_interpreter_frame()). The effect of this is that stacktraces are currently reporting erroneous bci information, especially in the context of OSR frames, which would normally have a bci > 0 if the OSR is triggered by backedge counter overflow.
>
> Thanks also for pointing to the other bytecode restore location (@L2213).
>
> I have updated the webrev with your inputs in light of the other redundant bytecode restore in addition to updates to the copyright header(s).
>
> In addition, I took the opportunity to turn vframeStreamCommon::found_bad_method_frame() into a DEBUG_ONLY conditional function.
>
> Updated webrev:
>
> http://cr.openjdk.java.net/~mgronlun/8175178/webrev02/>
> Regarding:
>
> "It feels like we're missing some infrastructure to prevent the accidental use of 'r13'."

We use r13 (and rlocals) as a scratch register in lots of places due to not having enough registers on x86. As long as there isn't a safepoint (or call into the runtime as you found), this should be perfectly safe, as long as there's a restore_bcp() or restore_locals() call.

> Thanks a lot Dan for taking a look at this.
>
> The "failure mode" in a release build is that any unresolved bci will be unconditionally re-set to 0 (see vframeStreamCommon::fill_from_interpreter_frame()). The effect of this is that stacktraces are currently reporting erroneous bci information, especially in the context of OSR frames, which would normally have a bci > 0 if the OSR is triggered by backedge counter overflow.
>
> Thanks also for pointing to the other bytecode restore location (@L2213).
>
> I have updated the webrev with your inputs in light of the other redundant bytecode restore in addition to updates to the copyright header(s).
>
> In addition, I took the opportunity to turn vframeStreamCommon::found_bad_method_frame() into a DEBUG_ONLY conditional function.
>
> Updated webrev:
>
> http://cr.openjdk.java.net/~mgronlun/8175178/webrev02/

> Thanks a lot Dan for taking a look at this.
>
> The "failure mode" in a release build is that any unresolved bci will be unconditionally re-set to 0 (see vframeStreamCommon::fill_from_interpreter_frame()). The effect of this is that stacktraces are currently reporting erroneous bci information, especially in the context of OSR frames, which would normally have a bci > 0 if the OSR is triggered by backedge counter overflow.
>
> Thanks also for pointing to the other bytecode restore location (@L2213).
>
> I have updated the webrev with your inputs in light of the other redundant bytecode restore in addition to updates to the copyright header(s).
>
> In addition, I took the opportunity to turn vframeStreamCommon::found_bad_method_frame() into a DEBUG_ONLY conditional function.
>
> Updated webrev:
>
> http://cr.openjdk.java.net/~mgronlun/8175178/webrev02/

> Hi again Coleen,
>
> I looked into changing "noreg" to "rax" for the call to SharedRuntime::OSR_migration_begin(), but this is not correct.
>
> By explicitly passing rax, we are indicating the interest in receiving some kind of "managed" return value (like an oop in this example).
>
> MacroAssembler::call_VM_base() will proceed with storing back the thread->vm_result() oop into rax, instead of leaving the platform specific return value holding the natively allocated buffer.
>
> This is because CONSTANT_REGISTER_DECLARATION(rax) == 0 entails oop_result->is_valid() while CONSTANT_REGISTER_DECLARATION(noreg) == -1 entails !oop_result->is_valid().
>
> Therefore I will keep the noreg in place as-is.

Okay, thanks for the explanation! This is reviewed by me.
Coleen

>
> Thanks
> Markus
>
> -----Original Message-----
> From: Markus Gronlund
> Sent: den 23 februari 2017 00:55
> To: Coleen Phillimore; Daniel Daugherty
> Cc: [hidden email]> Subject: RE: RFR(S): 8175178: Stack traversal during OSR migration asserts with invalid bci or invalid scope desc on x86
>
> Thanks Coleen,
>
> I will also update the "noreg" to "rax" before putback.
>
> Thanks again
> Markus
>
> -----Original Message-----
> From: Coleen Phillimore
> Sent: den 22 februari 2017 22:21
> To: [hidden email]> Subject: Re: RFR(S): 8175178: Stack traversal during OSR migration asserts with invalid bci or invalid scope desc on x86
>
>
> This change looks good to me. Although I think this
>
> call_VM(noreg, CAST_FROM_FN_PTR(address, SharedRuntime::OSR_migration_begin));
>
>
> Should be:
>
> call_VM(rax, CAST_FROM_FN_PTR(address, SharedRuntime::OSR_migration_begin));
>
>
> Since this returns the OSR buffer in rax.
>
>
> On 2/19/17 8:07 AM, Markus Gronlund wrote:
>> Thanks a lot Dan for taking a look at this.
>>
>> The "failure mode" in a release build is that any unresolved bci will be unconditionally re-set to 0 (see vframeStreamCommon::fill_from_interpreter_frame()). The effect of this is that stacktraces are currently reporting erroneous bci information, especially in the context of OSR frames, which would normally have a bci > 0 if the OSR is triggered by backedge counter overflow.
>>
>> Thanks also for pointing to the other bytecode restore location (@L2213).
>>
>> I have updated the webrev with your inputs in light of the other redundant bytecode restore in addition to updates to the copyright header(s).
>>
>> In addition, I took the opportunity to turn vframeStreamCommon::found_bad_method_frame() into a DEBUG_ONLY conditional function.
>>
>> Updated webrev:
>>
>> http://cr.openjdk.java.net/~mgronlun/8175178/webrev02/>>
>> Regarding:
>>
>> "It feels like we're missing some infrastructure to prevent the accidental use of 'r13'."
> We use r13 (and rlocals) as a scratch register in lots of places due to not having enough registers on x86. As long as there isn't a safepoint (or call into the runtime as you found), this should be perfectly safe, as long as there's a restore_bcp() or restore_locals() call.
>
> Coleen
>
>> I agree with you, lets see what we can do to improve this.
>>
>> Thanks again
>> Markus
>>
>>
>> -----Original Message-----
>> From: Daniel D. Daugherty
>> Sent: den 17 februari 2017 18:51
>> To: Markus Gronlund; [hidden email]>> Subject: Re: RFR(S): 8175178: Stack traversal during OSR migration
>> asserts with invalid bci or invalid scope desc on x86
>>
>> On 2/17/17 5:29 AM, Markus Gronlund wrote:
>>> Greetings,
>>>
>>>
>>>
>>> Kindly asking for reviews for the following changeset:
>>>
>>>
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8175178>>>
>>> Webrev: http://cr.openjdk.java.net/~mgronlun/8175178/webrev01/>> src/cpu/x86/vm/templateTable_x86.cpp
>> (old) L2228: __ load_unsigned_byte(rbx, Address(rbcp, 0));
>> // restore target bytecode
>> I was a little concerned about not restoring the target
>> bytecode into the 'rbx' register, but when I looked at
>> dispatch code:
>>
>> L2196: __ bind(dispatch);
>> L2197: }
>> L2198:
>> L2199: // Pre-load the next target bytecode into rbx
>> L2200: __ load_unsigned_byte(rbx, Address(rbcp, 0));
>>
>> so it looks like rbx gets the target bytecode just fine.
>>
>> L2213: __ load_unsigned_byte(rbx, Address(rbcp, 0)); //
>> restore target bytecode
>> L2214: __ set_method_data_pointer_for_bcp();
>> L2215: __ jmp(dispatch);
>> I think the "restore target bytecode" here is also redundant.
>>
>> src/cpu/x86/vm/interp_masm_x86.cpp:
>>
>> InterpreterMacroAssembler::set_method_data_pointer_for_bcp() {
>> <snip>
>>
>> push(rbx);
>>
>> get_method(rbx);
>>
>> src/cpu/x86/vm/interp_masm_x86.hpp:
>>
>> void get_method(Register reg) {
>> movptr(reg, Address(rbp,
>> frame::interpreter_frame_method_offset * wordSize));
>> }
>>
>> set_method_data_pointer_for_bcp() doesn't use the rbx
>> value that we bothered to restore.
>>
>> OK, so on 64-bit the code saved the nmethod before the call to
>> SharedRuntime::OSR_migration_begin() and it was using 'r13'
>> which is the register we use for 'bcp' in 64-bit. This use of r13/bcp was visible to stack walkers (because of save_bcp()) and caused the assert() failure. What's the failure mode in release bits?
>>
>> This is outstanding sleuthing!
>>
>> You've switched the save to use 'rbx' on both 64-bit and 32-bit and you've removed stale code that was using 'rbx' for saving the target bytecode unnecessarily.
>>
>> Thumbs up on this change!
>>
>> Please don't forget to update the copyright before you push.
>>
>> Dan
>>
>> P.S.
>> It feels like we're missing some infrastructure to prevent the accidental use of 'r13'. We have other "special" registers that we guard against being used... Perhaps we need something for 'r13'.
>>
>>
>>>
>>>
>>> Summary:
>>>
>>>
>>>
>>> vframeStream stack traversal can assert on x86 when trying to decode an interpreter frame that is in the process of being migrated for On-Stack Replacement (OSR). This is because the interpreter frame does not have a valid bcp, it instead has an nmethod in the bcp slot (the OSR nmethod).
>>>
>>> #
>>> # A fatal error has been detected by the Java Runtime Environment:
>>> #
>>> # Internal Error
>>> (distro/s/hotspot/src/share/vm/runtime/vframe.cpp:472), pid=3624,
>>> tid=3640 # assert(false) failed: invalid bci or invalid scope desc #
>>>
>>> There is currently a save operation that uses r13 for saving the OSR nmethod over the VM call into SharedRuntime::OSR_migration_begin(). This has the side-effect of installing the OSR nmethod into the interpreter frame bcp slot.
>>>
>>>
>>>
>>> Thanks
>>>
>>> Markus