Commit Message

On 08/06/2012 11:34 AM, Ulrich Weigand wrote:
> There is one particular inefficiency I have noticed. This code:> > if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0))> abort ();> > from atomic-compare-exchange-3.c gets compiled into:> > l %r3,0(%r2)> larl %r1,v> cs %r3,%r4,0(%r1)> ipm %r1> sra %r1,28> st %r3,0(%r2)> ltr %r1,%r1> jne .L3> > which is extremely inefficient; it converts the condition code into> an integer using the slow ipm, sra sequence, just so that it can> convert the integer back into a condition code via ltr and branch> on it ...
This was caused (or perhaps abetted by) the representation of EQ
as NE ^ 1. With the subsequent truncation and zero-extend, I
think combine reached its insn limit of 3 before seeing everything
it needed to see.
I'm able to fix this problem by representing EQ as EQ before reload.
For extimm targets this results in identical code; for older targets
it requires avoidance of the constant pool, i.e. LHI+XR instead of X.
l %r2,0(%r3)
larl %r1,v
cs %r2,%r5,0(%r1)
st %r2,0(%r3)
jne .L3
That fixed, we see the second CAS in that file:
.loc 1 27 0
cs %r2,%r2,0(%r1)
ipm %r5
sll %r5,28
lhi %r0,1
xr %r5,%r0
st %r2,0(%r3)
ltr %r5,%r5
je .L20
This happens because CSE notices the cbranch vs 0, and sets r116
to zero along the path to
32 if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG,
__ATOMIC_RELEASE, __ATOMIC_ACQUIRE))
at which point CSE decides that it would be cheaper to "re-use"
the zero already in r116 instead of load another constant 0 here.
After that, combine is ham-strung because r116 is not dead.
I'm not quite sure the best way to fix this, since rtx_costs already
has all constants cost 0. CSE ought not believe that r116 is better
than a plain constant. CSE also shouldn't be extending the life of
pseudos this way.
A short-term possibility is to have the CAS insns accept general_operand,
so that the 0 gets merged. With reload inheritance and post-reload cse,
that might produce code that is "good enough". Certainly it's effective
for the atomic-compare-exchange-3.c testcase. I'm less than happy with
that, since the non-optimization of CAS depends on following code that
is totally unrelated.
This patch ought to be independent of any other patch so far.
r~

Comments

> This was caused (or perhaps abetted by) the representation of EQ> as NE ^ 1. With the subsequent truncation and zero-extend, I> think combine reached its insn limit of 3 before seeing everything> it needed to see.
This can be 4 now, if you tweak the initial heuristic.