Comments

This patch fixes PR rtl-optimization/52175, which is another case of
dbr_schedule mishandling the result of shrink-wrapping. There is code
to allow:
add rN,rN,CONST
to be put into a delay slot by adding a compensating:
add rN,rN,-CONST
to the opposite thread. (Which has caused headaches before because
it increases code size.) In the testcase, we end up doing this for
a frame-related stack allocation: we allocate stack in the delay
slot, then immediately deallocate it again on the opposite thread.
This triggers an ICE because the deallocation is not marked as
frame-related, so it looks to the cfi code as though we can reach
the same point with two different CFAs.
We don't really want the optimisers to add new frame-related
annotations, and it's difficult to do that anyway in the general
case where the annotation is based on an instruction that had
REG_FRAME_RELATED_EXPR notes attached. I also can't really think
of many cases where it would be worthwhile. So this patch stops
us applying for the optimisation in that case, just like we did
for PR 51471.
I strongly suspect that doing this optimisation for the stack pointer is
independently wrong -- e.g. if an alloca is protected by a bounds check --
but since that part isn't a regression, I'm not tackling it here.
And, to really over-egg this one-liner, I thought I'd better
mention why I fixed it this way. The code reads:
/* If we haven't found anything for this delay slot and it is very
likely that the branch will be taken, see if the insn at our target
increments or decrements a register with an increment that does not
depend on the destination register. If so, try to place the opposite
arithmetic insn after the jump insn and put the arithmetic insn in the
delay slot. If we can't do this, return. */
if (delay_list == 0 && likely
&& new_thread && !ANY_RETURN_P (new_thread)
&& NONJUMP_INSN_P (new_thread)
&& GET_CODE (PATTERN (new_thread)) != ASM_INPUT
&& asm_noperands (PATTERN (new_thread)) < 0)
{
rtx pat = PATTERN (new_thread);
rtx dest;
rtx src;
trial = new_thread;
pat = PATTERN (trial);
if (!NONJUMP_INSN_P (trial)
|| GET_CODE (pat) != SET
|| ! eligible_for_delay (insn, 0, trial, flags)
|| can_throw_internal (trial))
return 0;
And there's obviously some redundancy between the two if blocks here.
So which is the right way to handle the unoptimisable case?
Returning, or falling through? I think it's falling through,
so that we still redirect the jump in cases where we have skipped
a redundant insn without actually filling a delay slot. I therefore
added the check to the first "if" statement rather than the second.
Despite all that, this felt obvious enough to self-approve.
Tested on various mips* targets and applied.
Richard
gcc/
PR rtl-optimization/52175
* reorg.c (fill_slots_from_thread): Don't apply add/sub optimization
to frame-related instructions.
gcc/testsuite/
PR rtl-optimization/52175
* gcc.c-torture/compile/pr52175.c: New test.