My understanding the test instruction is useless here because the main idea of the test is

The flags SF, ZF, PF are modified while the result of the AND is discarded.

and here we don’t use these result flags.

Is it a bug in JIT or do I miss something?
If it is, where the best place for reporting it?
Thanks!

java assembly jvm jit jvm-hotspot

share|improve this question

edited Jan 6 at 11:40

Henrik Schumacher

1433

asked Jan 5 at 18:09

QIvanQIvan

31327

2

This instruction does indeed seem useless.

– fuzJan 5 at 18:23

6

FWIW, it implicitly checks that r11 contains a valid pointer, and raises an exception if not. Is that intentional? I don’t know, out of context.

– another-daveJan 5 at 18:45

2

Now that we know the answer, if the JVM had more time to analyze the surrounding code it could have used mov (%r11), %r9d because r9 is about to be written by another instruction. MOV is the same number of code bytes, but it’s a pure load without an ALU uop. This is a minor optimization because ALU port pressure is almost certainly not a problem here, and modern x86 CPUs keep the load micro-fused into a single uop with the ALU instruction through most of the pipeline so it doesn’t hurt front-end throughput.

– Peter CordesJan 6 at 1:36

But it does take an extra scheduler entry until the load is ready so the ALU uop can execute, and 2 ROB entries on Sandybridge and earlier Intel. IvyBridge & later have fused-domain ROB, but SnB has an unfused-domain ReOrder Buffer. Source: Mentioned in a row in table 3 in this paper: publications.vpw.me/publications/2015_uop_flow_simulation.pdf. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

– Peter CordesJan 6 at 1:38

@PeterCordes That’s pretty counterintuitive and strange. I always thought the microfused uops will keep fused until dispatching to execution port. I double check Agner Fog’s manual, they also say the uop will keep fused to RS. They even say in page 92 that saving an ROB entry is an advantage of micro fusion since PM, which is quite reasonable. Are you sure ROB is an unfused-domain until IvyBridge?

– llllllllllJan 6 at 4:46

|
show 4 more comments

28

5

I got the below assembly list as result for JIT compilation for my java program.

My understanding the test instruction is useless here because the main idea of the test is

The flags SF, ZF, PF are modified while the result of the AND is discarded.

and here we don’t use these result flags.

Is it a bug in JIT or do I miss something?
If it is, where the best place for reporting it?
Thanks!

java assembly jvm jit jvm-hotspot

share|improve this question

edited Jan 6 at 11:40

Henrik Schumacher

1433

asked Jan 5 at 18:09

QIvanQIvan

31327

2

This instruction does indeed seem useless.

– fuzJan 5 at 18:23

6

FWIW, it implicitly checks that r11 contains a valid pointer, and raises an exception if not. Is that intentional? I don’t know, out of context.

– another-daveJan 5 at 18:45

2

Now that we know the answer, if the JVM had more time to analyze the surrounding code it could have used mov (%r11), %r9d because r9 is about to be written by another instruction. MOV is the same number of code bytes, but it’s a pure load without an ALU uop. This is a minor optimization because ALU port pressure is almost certainly not a problem here, and modern x86 CPUs keep the load micro-fused into a single uop with the ALU instruction through most of the pipeline so it doesn’t hurt front-end throughput.

– Peter CordesJan 6 at 1:36

But it does take an extra scheduler entry until the load is ready so the ALU uop can execute, and 2 ROB entries on Sandybridge and earlier Intel. IvyBridge & later have fused-domain ROB, but SnB has an unfused-domain ReOrder Buffer. Source: Mentioned in a row in table 3 in this paper: publications.vpw.me/publications/2015_uop_flow_simulation.pdf. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

– Peter CordesJan 6 at 1:38

@PeterCordes That’s pretty counterintuitive and strange. I always thought the microfused uops will keep fused until dispatching to execution port. I double check Agner Fog’s manual, they also say the uop will keep fused to RS. They even say in page 92 that saving an ROB entry is an advantage of micro fusion since PM, which is quite reasonable. Are you sure ROB is an unfused-domain until IvyBridge?

– llllllllllJan 6 at 4:46

|
show 4 more comments

28

5

28

5

28

5

I got the below assembly list as result for JIT compilation for my java program.

My understanding the test instruction is useless here because the main idea of the test is

The flags SF, ZF, PF are modified while the result of the AND is discarded.

and here we don’t use these result flags.

Is it a bug in JIT or do I miss something?
If it is, where the best place for reporting it?
Thanks!

java assembly jvm jit jvm-hotspot

java assembly jvm jit jvm-hotspot

share|improve this question

edited Jan 6 at 11:40

Henrik Schumacher

1433

asked Jan 5 at 18:09

QIvanQIvan

31327

share|improve this question

edited Jan 6 at 11:40

Henrik Schumacher

1433

asked Jan 5 at 18:09

QIvanQIvan

31327

share|improve this question

share|improve this question

edited Jan 6 at 11:40

Henrik Schumacher

1433

edited Jan 6 at 11:40

Henrik Schumacher

1433

edited Jan 6 at 11:40

Henrik Schumacher

1433

1433

asked Jan 5 at 18:09

QIvanQIvan

31327

asked Jan 5 at 18:09

QIvanQIvan

31327

asked Jan 5 at 18:09

QIvanQIvan

31327

31327

2

This instruction does indeed seem useless.

– fuzJan 5 at 18:23

6

FWIW, it implicitly checks that r11 contains a valid pointer, and raises an exception if not. Is that intentional? I don’t know, out of context.

– another-daveJan 5 at 18:45

2

Now that we know the answer, if the JVM had more time to analyze the surrounding code it could have used mov (%r11), %r9d because r9 is about to be written by another instruction. MOV is the same number of code bytes, but it’s a pure load without an ALU uop. This is a minor optimization because ALU port pressure is almost certainly not a problem here, and modern x86 CPUs keep the load micro-fused into a single uop with the ALU instruction through most of the pipeline so it doesn’t hurt front-end throughput.

– Peter CordesJan 6 at 1:36

But it does take an extra scheduler entry until the load is ready so the ALU uop can execute, and 2 ROB entries on Sandybridge and earlier Intel. IvyBridge & later have fused-domain ROB, but SnB has an unfused-domain ReOrder Buffer. Source: Mentioned in a row in table 3 in this paper: publications.vpw.me/publications/2015_uop_flow_simulation.pdf. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

– Peter CordesJan 6 at 1:38

@PeterCordes That’s pretty counterintuitive and strange. I always thought the microfused uops will keep fused until dispatching to execution port. I double check Agner Fog’s manual, they also say the uop will keep fused to RS. They even say in page 92 that saving an ROB entry is an advantage of micro fusion since PM, which is quite reasonable. Are you sure ROB is an unfused-domain until IvyBridge?

– llllllllllJan 6 at 4:46

|
show 4 more comments

2

This instruction does indeed seem useless.

– fuzJan 5 at 18:23

6

FWIW, it implicitly checks that r11 contains a valid pointer, and raises an exception if not. Is that intentional? I don’t know, out of context.

– another-daveJan 5 at 18:45

2

Now that we know the answer, if the JVM had more time to analyze the surrounding code it could have used mov (%r11), %r9d because r9 is about to be written by another instruction. MOV is the same number of code bytes, but it’s a pure load without an ALU uop. This is a minor optimization because ALU port pressure is almost certainly not a problem here, and modern x86 CPUs keep the load micro-fused into a single uop with the ALU instruction through most of the pipeline so it doesn’t hurt front-end throughput.

– Peter CordesJan 6 at 1:36

But it does take an extra scheduler entry until the load is ready so the ALU uop can execute, and 2 ROB entries on Sandybridge and earlier Intel. IvyBridge & later have fused-domain ROB, but SnB has an unfused-domain ReOrder Buffer. Source: Mentioned in a row in table 3 in this paper: publications.vpw.me/publications/2015_uop_flow_simulation.pdf. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

– Peter CordesJan 6 at 1:38

@PeterCordes That’s pretty counterintuitive and strange. I always thought the microfused uops will keep fused until dispatching to execution port. I double check Agner Fog’s manual, they also say the uop will keep fused to RS. They even say in page 92 that saving an ROB entry is an advantage of micro fusion since PM, which is quite reasonable. Are you sure ROB is an unfused-domain until IvyBridge?

– llllllllllJan 6 at 4:46

2

2

This instruction does indeed seem useless.

– fuzJan 5 at 18:23

This instruction does indeed seem useless.

– fuzJan 5 at 18:23

6

6

FWIW, it implicitly checks that r11 contains a valid pointer, and raises an exception if not. Is that intentional? I don’t know, out of context.

– another-daveJan 5 at 18:45

FWIW, it implicitly checks that r11 contains a valid pointer, and raises an exception if not. Is that intentional? I don’t know, out of context.

– another-daveJan 5 at 18:45

2

2

Now that we know the answer, if the JVM had more time to analyze the surrounding code it could have used mov (%r11), %r9d because r9 is about to be written by another instruction. MOV is the same number of code bytes, but it’s a pure load without an ALU uop. This is a minor optimization because ALU port pressure is almost certainly not a problem here, and modern x86 CPUs keep the load micro-fused into a single uop with the ALU instruction through most of the pipeline so it doesn’t hurt front-end throughput.

– Peter CordesJan 6 at 1:36

Now that we know the answer, if the JVM had more time to analyze the surrounding code it could have used mov (%r11), %r9d because r9 is about to be written by another instruction. MOV is the same number of code bytes, but it’s a pure load without an ALU uop. This is a minor optimization because ALU port pressure is almost certainly not a problem here, and modern x86 CPUs keep the load micro-fused into a single uop with the ALU instruction through most of the pipeline so it doesn’t hurt front-end throughput.

– Peter CordesJan 6 at 1:36

But it does take an extra scheduler entry until the load is ready so the ALU uop can execute, and 2 ROB entries on Sandybridge and earlier Intel. IvyBridge & later have fused-domain ROB, but SnB has an unfused-domain ReOrder Buffer. Source: Mentioned in a row in table 3 in this paper: publications.vpw.me/publications/2015_uop_flow_simulation.pdf. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

– Peter CordesJan 6 at 1:38

But it does take an extra scheduler entry until the load is ready so the ALU uop can execute, and 2 ROB entries on Sandybridge and earlier Intel. IvyBridge & later have fused-domain ROB, but SnB has an unfused-domain ReOrder Buffer. Source: Mentioned in a row in table 3 in this paper: publications.vpw.me/publications/2015_uop_flow_simulation.pdf. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

– Peter CordesJan 6 at 1:38

@PeterCordes That’s pretty counterintuitive and strange. I always thought the microfused uops will keep fused until dispatching to execution port. I double check Agner Fog’s manual, they also say the uop will keep fused to RS. They even say in page 92 that saving an ROB entry is an advantage of micro fusion since PM, which is quite reasonable. Are you sure ROB is an unfused-domain until IvyBridge?

– llllllllllJan 6 at 4:46

@PeterCordes That’s pretty counterintuitive and strange. I always thought the microfused uops will keep fused until dispatching to execution port. I double check Agner Fog’s manual, they also say the uop will keep fused to RS. They even say in page 92 that saving an ROB entry is an advantage of micro fusion since PM, which is quite reasonable. Are you sure ROB is an unfused-domain until IvyBridge?

– llllllllllJan 6 at 4:46

|
show 4 more comments

1 Answer1

active

oldest

votes

39

That must be the thread-local handshake poll.
Look where %r11 is read from. If it is read from some offset off the %r15 (thread-local storage), that’s the guy. See the example here:

It is not useless, it would cause SEGV once the guard page is marked non-readable, and that would transfer control to JVM’s SEGV handler. This is part of JVM’s mechanics to safepoint Java threads, e.g. for GC.

It is not useless, it would cause SEGV once the guard page is marked non-readable, and that would transfer control to JVM’s SEGV handler. This is part of JVM’s mechanics to safepoint Java threads, e.g. for GC.

UPD: Hopefully, more details here.

share|improve this answer

edited Jan 5 at 22:48

answered Jan 5 at 19:07

Aleksey ShipilevAleksey Shipilev

13.9k23770

add a comment |

39

That must be the thread-local handshake poll.
Look where %r11 is read from. If it is read from some offset off the %r15 (thread-local storage), that’s the guy. See the example here:

It is not useless, it would cause SEGV once the guard page is marked non-readable, and that would transfer control to JVM’s SEGV handler. This is part of JVM’s mechanics to safepoint Java threads, e.g. for GC.

UPD: Hopefully, more details here.

share|improve this answer

edited Jan 5 at 22:48

answered Jan 5 at 19:07

Aleksey ShipilevAleksey Shipilev

13.9k23770

add a comment |

39

39

39

That must be the thread-local handshake poll.
Look where %r11 is read from. If it is read from some offset off the %r15 (thread-local storage), that’s the guy. See the example here:

It is not useless, it would cause SEGV once the guard page is marked non-readable, and that would transfer control to JVM’s SEGV handler. This is part of JVM’s mechanics to safepoint Java threads, e.g. for GC.

UPD: Hopefully, more details here.

share|improve this answer

edited Jan 5 at 22:48

answered Jan 5 at 19:07

Aleksey ShipilevAleksey Shipilev

13.9k23770

That must be the thread-local handshake poll.
Look where %r11 is read from. If it is read from some offset off the %r15 (thread-local storage), that’s the guy. See the example here:

It is not useless, it would cause SEGV once the guard page is marked non-readable, and that would transfer control to JVM’s SEGV handler. This is part of JVM’s mechanics to safepoint Java threads, e.g. for GC.

UPD: Hopefully, more details here.

share|improve this answer

edited Jan 5 at 22:48

answered Jan 5 at 19:07

Aleksey ShipilevAleksey Shipilev

13.9k23770

share|improve this answer

share|improve this answer

edited Jan 5 at 22:48

edited Jan 5 at 22:48

edited Jan 5 at 22:48

answered Jan 5 at 19:07

Aleksey ShipilevAleksey Shipilev

13.9k23770

answered Jan 5 at 19:07

Aleksey ShipilevAleksey Shipilev

13.9k23770

answered Jan 5 at 19:07

Aleksey ShipilevAleksey Shipilev

13.9k23770

13.9k23770

add a comment |

add a comment |

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.