Details

We currently have no way to emit the xxspltw for a word splat and a call to vec_splat for vectors with 8-byte elements get translated to vperm with a mask vector that comes from memory.
This patch provides intrinsics to get at xxspltw and xxspltd (extended mnemonic). In a subsequent patch, altivec.h will be modified to use these intrinsics for the respective vec_splat definitions.

This provides a significant improvement in one benchmark that uses vec_splat.

Yes, D18593 has the FE portions of this patch.
I don't think having this intrinsic prevents us from using these for the shufflevector instruction. These can be subsequently added as options to the framework for emitting vector shuffles. Namely, shufflevector is probably too general of an instruction to do this all in the .td files.
Notice that all the Altivec instructions that can be used for vector shuffles have code fragments that check whether the node has a mask that is eligible for that instruction.
The canonical type for all the vector shuffles is v16i8 which isn't a type that can be in vsrc. Of course, this can be changed...

I think that the simplest thing to do would be similar to the way the QPX QVESPLATI instruction is handled (namely, check for VSX, check that the node is a splat of the correct type, emit xxspltw/xxpermdi).

Of course, this is just my opinion and if people feel that I should implement this as a vector shuffle immediately, I can do so.

You definitely know more about our codegen than I do and I may be missing something here. Also it will be good to hear from other reviewers about this (@hfinkel, @kbarton, @wschmidt, @seurer). The way I see it (and I maybe wrong) is this. We have two options:

1- Current implementation which is simpler.
2- Using vector shuffle, which requires extra C++ code, to get it right. It is more complicated.

I think (1) results in introducing a new target specific intrinsic which is not understood by optimizations. Also this intrinsic maybe used for other purposes in the future. Potentially causing optimization problems. So even though (2) is more complicated, to me it seems that (2) is the right way to go.

One reason that I see we may want to choose (1) is that the complexity of (2) is too much. There might be other reasons that I have not realized.

I don't really see a difference between what the two of you are proposing, except whether the frontend uses an intrinsic or not. Is this right?

Our general preference, across all targets, is that the frontend headers emit generic IR whenever possible (including using things like __builtin_shufflevector, or equivalent syntax. Then we match it in the backend.

OK, I think it's settled. I'll update the patch so that we retain the information that a vec_splat is a vector_shuffle rather than using a target-specific builtin. Thank you all for your comments. New patch coming up tomorrow :).

I've changed this around a bit to match an SDAG pattern that will be emitted when lowering vector_shuffle nodes rather than matching an intrinsic. This way, we'll get the FE to emit vector shuffles for splats (in the update to D18593).

Also, since XXSPLTW is a splat, I've added the handling for it to the swap removal pass.