Comments

Hi!
This patch started with noticing while working on PR50596 that
#define N 1024
long long a[N];
char b[N];
void
foo (void)
{
int i;
for (i = 0; i < N; i++)
b[i] = a[i];
}
is even with -O3 -mavx2 vectorized just with 16-byte vectors
instead of 32-byte vectors and has various fixes I've noticed
when diving into it. The vector permutations with AVX2 aren't
very easy, because some instructions don't shuffle cross-lane,
some do but only for some modes. The patch adds AVX2
vec_pack_trunc* expanders so that the above can be vectorized,
and implements a couple of permutation sequences, including for
a single operand __builtin_vec_shuffle a 4 insn sequence that
handles arbitrary V32QI/V16HI constant permutations (and some
cases where 1 insn is possible too) and also a variable mask
V{32Q,16H,8S,4D}I permutations.
I think we badly need testcase which will try all possible
constant permutations (probably one testcase per mode),
even for V32QImode that's just 32x32 plus 32x64 tests (if
split into 32 tests in a function times 96 noinline functions),
but with that I'd like to wait for Richard's permutation improvements,
because although currently the backend signalizes it can handle
some constant argument e.g. V32QImode permutation, as there is no
V32QImode permutation builtin __builtin_shuffle emits it as
variable mask operation.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
2011-10-12 Jakub Jelinek <jakub@redhat.com>
* config/i386/i386.md (UNSPEC_VPERMDI): Remove.
* config/i386/i386.c (ix86_expand_vec_perm): Handle
V16QImode and V32QImode for TARGET_AVX2.
(MAX_VECT_LEN): Increase to 32.
(expand_vec_perm_blend): Add support for 32-byte integer
vectors with TARGET_AVX2.
(valid_perm_using_mode_p): New function.
(expand_vec_perm_pshufb): Add support for 32-byte integer
vectors with TARGET_AVX2.
(expand_vec_perm_vpshufb2_vpermq): New function.
(expand_vec_perm_vpshufb2_vpermq_even_odd): New function.
(expand_vec_perm_even_odd_1): Handle 32-byte integer vectors
with TARGET_AVX2.
(ix86_expand_vec_perm_builtin_1): Try expand_vec_perm_vpshufb2_vpermq
and expand_vec_perm_vpshufb2_vpermq_even_odd.
* config/i386/sse.md (VEC_EXTRACT_EVENODD_MODE): Add for TARGET_AVX2
32-byte integer vector modes.
(vec_pack_trunc_<mode>): Use VI248_AVX2 instead of VI248_128.
(avx2_interleave_highv32qi, avx2_interleave_lowv32qi): Remove pasto.
(avx2_pshufdv3, avx2_pshuflwv3, avx2_pshufhwv3): Generate
4 new operands.
(avx2_pshufd_1, avx2_pshuflw_1, avx2_pshufhw_1): Don't use
match_dup, instead add 4 new operands and require they have
right cross-lane values.
(avx2_permv4di): Change into define_expand.
(avx2_permv4di_1): New instruction.
(avx2_permv2ti): Use nonimmediate_operand instead of register_operand
for "xm" constrained operand.
(VEC_PERM_AVX2): Add V32QI and V16QI for TARGET_AVX2.
Jakub