Commit Message

On 20 September 2012 09:55, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> On 20 September 2012 09:12, Eric Botcazou <ebotcazou@adacore.com> wrote:>>> The attached patch catches C constructs:>>> (A << 8) | (A >> 8)>>> where A is unsigned 16 bits>>> and maps them to builtin_bswap16(A) which can provide more efficient>>> implementations on some targets.>>>> This belongs in tree-ssa-math-opts.c:execute_optimize_bswap instead.>>> OK I'll have a look at that. Actually I modified fold-const.c because> it's here that the similar 32 bits pattern is turned into a rotate.>>> When I implemented __builtin_bswap16, I didn't add this because I thought this>> would be overkill since the RTL combiner should be able to catch the pattern.>> Have you investigated on this front? But I don't have a strong opinion.>>> No I didn't. As I said above, I looked at where the 32 bits pattern> was handled and added the 16 bits one.>> Christophe.
Here is a new patch, modifying tree-ssa-math-opts.c as you suggested.
It's indeed simpler :-)
Validated with qemu-arm on target arm-none-linux-gnueabi.
OK?
Christophe.
2012-09-21 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* tree-ssa-math-opts.c (bswap_stats): Add found_16bit field.
(execute_optimize_bswap): Add support for builtin_bswap16.
gcc/testsuite/
* gcc.target/arm/builtin-bswap16-1.c: New testcase.

Comments

> Here is a new patch, modifying tree-ssa-math-opts.c as you suggested.> It's indeed simpler :-)> > Validated with qemu-arm on target arm-none-linux-gnueabi.
I cannot formally approve, but this looks good to me.
However, could you check that this is also an improvement for PowerPC, which
is the only other mainstream architecture with a bswaphi pattern AFAIK?

On Fri, Sep 21, 2012 at 1:04 AM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> On 20 September 2012 09:55, Christophe Lyon <christophe.lyon@linaro.org> wrote:>> On 20 September 2012 09:12, Eric Botcazou <ebotcazou@adacore.com> wrote:>>>> The attached patch catches C constructs:>>>> (A << 8) | (A >> 8)>>>> where A is unsigned 16 bits>>>> and maps them to builtin_bswap16(A) which can provide more efficient>>>> implementations on some targets.>>>>>> This belongs in tree-ssa-math-opts.c:execute_optimize_bswap instead.>>>>> OK I'll have a look at that. Actually I modified fold-const.c because>> it's here that the similar 32 bits pattern is turned into a rotate.>>>>> When I implemented __builtin_bswap16, I didn't add this because I thought this>>> would be overkill since the RTL combiner should be able to catch the pattern.>>> Have you investigated on this front? But I don't have a strong opinion.>>>>> No I didn't. As I said above, I looked at where the 32 bits pattern>> was handled and added the 16 bits one.>>>> Christophe.>> Here is a new patch, modifying tree-ssa-math-opts.c as you suggested.> It's indeed simpler :-)>> Validated with qemu-arm on target arm-none-linux-gnueabi.
This fixes PR 43550 also.
Thanks,
Andrew

On 21 September 2012 12:56, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> Here is a new patch, modifying tree-ssa-math-opts.c as you suggested.>> It's indeed simpler :-)>>>> Validated with qemu-arm on target arm-none-linux-gnueabi.>> I cannot formally approve, but this looks good to me.>> However, could you check that this is also an improvement for PowerPC, which> is the only other mainstream architecture with a bswaphi pattern AFAIK?>> --> Eric Botcazou
I have not yet been able to build an environment to run the testsuite
with qemu-ppc (not sure about the best target & dejagnu board
selection).
However, when compiling a sample test
short myswaps16(short x) {
return (x << 8) | (x >> 8);
}
unsigned short myswapu16(unsigned short x) {
return (x << 8) | (x >> 8);
}
The generated code is now:
myswaps16:
rlwinm 10,3,8,16,23
rlwinm 9,3,24,24,31
or 9,9,10
extsh 3,9
blr
myswapu16:
rlwinm 10,3,8,16,23
rlwinm 9,3,24,24,31
or 9,9,10
rlwinm 3,9,0,0xffff
blr
While it was (without my patch):
myswaps16:
slwi 9,3,8
srawi 3,3,8
or 3,9,3
extsh 3,3
blr
myswapu16:
srwi 9,3,8
rlwinm 3,3,8,16,23
or 3,3,9
blr
I don't know PowerPC, but I am not sure it's an improvement. Is it?
Christophe.

>> The generated code is now:>> myswaps16:>> rlwinm 10,3,8,16,23>> rlwinm 9,3,24,24,31>> or 9,9,10>> extsh 3,9>> blr>>>> myswapu16:>> rlwinm 10,3,8,16,23>> rlwinm 9,3,24,24,31>> or 9,9,10>> rlwinm 3,9,0,0xffff>> blr>>>> While it was (without my patch):>>>> myswaps16:>> slwi 9,3,8>> srawi 3,3,8>> or 3,9,3>> extsh 3,3>> blr>>>> myswapu16:>> srwi 9,3,8>> rlwinm 3,3,8,16,23>> or 3,3,9>> blr>>>> I don't know PowerPC, but I am not sure it's an improvement. Is it?
slwi and srwi are just extended mnemonics for the same rlwinm
instruction,
so that's the same. The last instruction in the new unsigned variant is
superfluous, since it is just setting the top bits to zero, and they
already are. rlwinm is ever so slightly better than srawi (in the
signed
version), because srawi sets the carry bit in addition to the GPR, so
that
is an improvement.
But we can do this sequence in just two instructions:
rlwimi 3,3,16,0,15
rlwinm 3,3,8,16,31
blr
so some more work is needed to make this optimal ;-)
Christophe, it looks like the zero-extend in the unsigned case is not
needed on any target? Assuming the shifts are at least SImode, of
course (I'm too lazy to check, sorry).
Segher

On 25 September 2012 07:00, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> Christophe, it looks like the zero-extend in the unsigned case is not> needed on any target? Assuming the shifts are at least SImode, of> course (I'm too lazy to check, sorry).>
It's also present when compiling:
unsigned short swapu16(unsigned short x) {
return __builtin_bswap16(x);
}
so it's not directly caused by my patch I think.
We have to look at the __builtin_bswap16 expansion with an unsigned argument.
Christophe.

>> Christophe, it looks like the zero-extend in the unsigned case is not>> needed on any target? Assuming the shifts are at least SImode, of>> course (I'm too lazy to check, sorry).>> It's also present when compiling:> unsigned short swapu16(unsigned short x) {> return __builtin_bswap16(x);> }>> so it's not directly caused by my patch I think.
The RTL is (set (reg:HI) (bswap:HI (reg:HI))) which then gets
extended for the SI (or DI) function return. Nothing to see here,
it's a target problem. Your results look good.
Segher

On 25 September 2012 13:32, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>>> Christophe, it looks like the zero-extend in the unsigned case is not>>> needed on any target? Assuming the shifts are at least SImode, of>>> course (I'm too lazy to check, sorry).>>>>>> It's also present when compiling:>> unsigned short swapu16(unsigned short x) {>> return __builtin_bswap16(x);>> }>>>> so it's not directly caused by my patch I think.>>> The RTL is (set (reg:HI) (bswap:HI (reg:HI))) which then gets> extended for the SI (or DI) function return. Nothing to see here,> it's a target problem. Your results look good.>
OK thank for the confirmation.
I guess I just have to wait for approval by the right maintainer now?

On Fri, Sep 21, 2012 at 10:04 AM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> On 20 September 2012 09:55, Christophe Lyon <christophe.lyon@linaro.org> wrote:>> On 20 September 2012 09:12, Eric Botcazou <ebotcazou@adacore.com> wrote:>>>> The attached patch catches C constructs:>>>> (A << 8) | (A >> 8)>>>> where A is unsigned 16 bits>>>> and maps them to builtin_bswap16(A) which can provide more efficient>>>> implementations on some targets.>>>>>> This belongs in tree-ssa-math-opts.c:execute_optimize_bswap instead.>>>>> OK I'll have a look at that. Actually I modified fold-const.c because>> it's here that the similar 32 bits pattern is turned into a rotate.>>>>> When I implemented __builtin_bswap16, I didn't add this because I thought this>>> would be overkill since the RTL combiner should be able to catch the pattern.>>> Have you investigated on this front? But I don't have a strong opinion.>>>>> No I didn't. As I said above, I looked at where the 32 bits pattern>> was handled and added the 16 bits one.>>>> Christophe.>> Here is a new patch, modifying tree-ssa-math-opts.c as you suggested.> It's indeed simpler :-)>> Validated with qemu-arm on target arm-none-linux-gnueabi.>> OK?
Assuming this one was the lastest patch ...
Ok.
Thanks,
Richard.
> Christophe.