Details

DAG combiner combines two shifts (for different shift amounts) into
shift + and with bitmask. Vector bitmasks require either quite a few
instructions or extra constant pool space. Avoid such combines as
leaving two vector shifts as they are produces better end results.

Intel targets tend to only accept vector shifts on Port0, while vector logic can use Port0/1/5 - not sure how much of an issue that would be @craig.topper ? Some AMD targets are almost as bad while others (Jaguar) can issue vector immediate shifts to any vector integer pipe.

I think only Skylake CPUs have more than 1 shift execution port if I remember right. I'm not sure what the impact would be. Can we limit this to MIPS MSA using the TLI.shouldFoldShiftPairToMask hook? Not sure why the call to that hook is made before we know the other value is a constant.