Comments

Hello,
this patch provides an alternate pattern to let combine recognize scalar
operations that preserve the high part of a vector. If the strategy is all
right, I could do the same for more operations (mul, div, ...). Something
similar is also possible for V4SF (different pattern though), but probably
not as useful.
bootstrap+testsuite ok.
2012-10-13 Marc Glisse <marc.glisse@inria.fr>
PR target/54855
gcc/
* config/i386/sse.md (*sse2_vm<plusminus_insn>v2df3): New define_insn.
gcc/testsuite/
* gcc.target/i386/pr54855.c: New testcase.

On Sat, Oct 13, 2012 at 10:52 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
> Hello,>> this patch provides an alternate pattern to let combine recognize scalar> operations that preserve the high part of a vector. If the strategy is all> right, I could do the same for more operations (mul, div, ...). Something> similar is also possible for V4SF (different pattern though), but probably> not as useful.
But, we _do_ have vec_merge pattern that describes the operation.
Adding another one to each operation just to satisfy combine is IMO
not correct approach. I'd rather see generic RTX simplification that
simplifies your proposed pattern to vec_merge pattern. Also, as you
mention in PR54855, Comment #5, the approach is too fragile...
Uros.

On Sun, 14 Oct 2012, Uros Bizjak wrote:
> On Sat, Oct 13, 2012 at 10:52 AM, Marc Glisse <marc.glisse@inria.fr> wrote:>> Hello,>>>> this patch provides an alternate pattern to let combine recognize scalar>> operations that preserve the high part of a vector. If the strategy is all>> right, I could do the same for more operations (mul, div, ...). Something>> similar is also possible for V4SF (different pattern though), but probably>> not as useful.>> But, we _do_ have vec_merge pattern that describes the operation.> Adding another one to each operation just to satisfy combine is IMO> not correct approach.
At some point I wondered about _replacing_ the existing pattern, so there
would only be one ;-)
The vec_merge pattern takes as argument 2 vectors instead of a vector and
a scalar, and describes the operation as a vector operation where we drop
half of the result, instead of a scalar operation where we re-add the top
half of the vector. I don't know if that's the most convenient choice.
Adding code in simplify-rtx to replace vec_merge with vec_concat /
vec_select might be easier than the other way around.
If the middle-end somehow gave us:
(plus X (vec_concat Y 0))
it would seem a bit strange to add an optimization that turns it into:
(vec_merge (plus X (subreg:V2DF Y)) X 1)
but then producing:
(vec_concat (plus (vec_select X 0) Y) (vec_select X 1))
would be strange as well.
(ignoring the signed zero issues here)
> I'd rather see generic RTX simplification that> simplifies your proposed pattern to vec_merge pattern.
Ok, I'll see what I can do.
> Also, as you mention in PR54855, Comment #5, the approach is too > fragile...
I am not sure I can make the RTX simplification much less fragile...
Whenever I see (vec_concat X (vec_select Y 1)), I would have to check
whether X is some (possibly large) tree of scalar computations involving
Y[0], move it all to vec_merge computations, and fix other users of some
of those scalars to now use S[0]. Seems too hard, I would stop at
single-operation X that is used only once. Besides, the gain is larger in
proportion when there is a single operation :-)
Thank you for your comments,

On Sun, 14 Oct 2012, Marc Glisse wrote:
> On Sun, 14 Oct 2012, Uros Bizjak wrote:>>> On Sat, Oct 13, 2012 at 10:52 AM, Marc Glisse <marc.glisse@inria.fr> wrote:>>> Hello,>>> >>> this patch provides an alternate pattern to let combine recognize scalar>>> operations that preserve the high part of a vector. If the strategy is all>>> right, I could do the same for more operations (mul, div, ...). Something>>> similar is also possible for V4SF (different pattern though), but probably>>> not as useful.>> >> But, we _do_ have vec_merge pattern that describes the operation.>> Adding another one to each operation just to satisfy combine is IMO>> not correct approach.>> At some point I wondered about _replacing_ the existing pattern, so there > would only be one ;-)>> The vec_merge pattern takes as argument 2 vectors instead of a vector and a > scalar, and describes the operation as a vector operation where we drop half > of the result, instead of a scalar operation where we re-add the top half of > the vector. I don't know if that's the most convenient choice. Adding code in > simplify-rtx to replace vec_merge with vec_concat / vec_select might be > easier than the other way around.>>> If the middle-end somehow gave us:> (plus X (vec_concat Y 0))> it would seem a bit strange to add an optimization that turns it into:> (vec_merge (plus X (subreg:V2DF Y)) X 1)> but then producing:> (vec_concat (plus (vec_select X 0) Y) (vec_select X 1))> would be strange as well.> (ignoring the signed zero issues here)>>> I'd rather see generic RTX simplification that>> simplifies your proposed pattern to vec_merge pattern.>> Ok, I'll see what I can do.>>> Also, as you mention in PR54855, Comment #5, the approach is too fragile...>> I am not sure I can make the RTX simplification much less fragile... Whenever > I see (vec_concat X (vec_select Y 1)), I would have to check whether X is > some (possibly large) tree of scalar computations involving Y[0], move it all > to vec_merge computations, and fix other users of some of those scalars to > now use S[0]. Seems too hard, I would stop at single-operation X that is used > only once. Besides, the gain is larger in proportion when there is a single > operation :-)>> Thank you for your comments,
Hello,
I experimented with the simplify-rtx transformation you suggested, see:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
It works when the argument is a register, but not for memory (which is
where the constant is in the testcase). And the description of the
operation in sse.md does seem problematic. It says the second argument is:
(match_operand:VF_128 2 "nonimmediate_operand" "xm,xm"))
but Intel's documentation says "The source operand can be an XMM register
or a 64-bit memory location", not quite the same.
Do you think the .md description should really stay this way, or could we
change it to something that better reflects "64-bit memory location"?