@dtemirbulatov@ABataev I'm not sure how much this will cross over but I'm investigating how to extend the alternate opcode mechanism to work with non-binary instructions. Initially I'm just looking at cast + call operators (e.g. sitofp/uitofp, floor/ceil etc.) but even that requires moderate refactoring of the getEntryCost and vectorizeTree ShuffleVector handling. What I'm not sure of is whether we could extend this idea to accept any partial vectorization and the alternate becomes a pass through - what do you think?

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

For reference in D48830, I'm proposing that we use (and extend) ConstantExpr::getBinOpIdentity(). IIUC, that would be the same thing that is shown as getDefaultConstantForOpcode() here.

@dtemirbulatov@ABataev I'm not sure how much this will cross over but I'm investigating how to extend the alternate opcode mechanism to work with non-binary instructions. Initially I'm just looking at cast + call operators (e.g. sitofp/uitofp, floor/ceil etc.) but even that requires moderate refactoring of the getEntryCost and vectorizeTree ShuffleVector handling. What I'm not sure of is whether we could extend this idea to accept any partial vectorization and the alternate becomes a pass through - what do you think?

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

I think this is possible if I correctly understood your idea. But we need to make this patch land at first. To do so we need to resolve the problems with the scheduling.

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

Yes, this is also a possible solution. What we need to do in this case is to tweak the cost model + improve gathering algorithm. Yes, that might work.

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

For reference in D48830, I'm proposing that we use (and extend) ConstantExpr::getBinOpIdentity(). IIUC, that would be the same thing that is shown as getDefaultConstantForOpcode() here.

Yes, I see, we can try to use it. AT least we need to think about this solution, need to estimate all pros/cons here

Implemented pair<Value*, Opcode> as the key in all maps/sets.
Fixed issue with incorrect memory dependency that is attached in testcase memory-dep.ll.
Allow Non-alterative operations to be stored in InstScheduleDataMap.
Removed IsNonAlt field out of InstructionsState.

Update after I found another couple of errors after additional testing the change. Here are changes:
Removed OpValue field out of PseudoScheduleData.
Forbid any bundles with non-alternative operations and remainder operation, see rem-bundle.ll.
Fixed error in setInsertPointAfterBundle() function by using getScheduleData() instead of getInstrScheduleData and if a bundle member is present multiple bundles at the same time then walk through the bundle to find the last scheduled member of the bundle. see insert-after-multiple-bundle.ll
Restore MemoryDependencies to SmallVector, we don't have to count a member presents in calculateDependencies().

Implemented Map<Instruction*, std::pair<Value *Parent, unsigned Opcode> indexing for ScalarToTreeEntry, PseudoInstScheduleDataMap.
Added reorderBundles() function to reorder bundles that have common instructions and common instructions were scheduled at least twice. We don't want to note which bundle was scheduled first. We could determine instruction layout after SLP scheduling.

Very strange that you still need particular scheduling class for the pseudo instructions, I think you can use the original data structure if you correctly implement the pseudo instruction itself. I still don't see all the changes we talked about.

Instead of ArrayRef<Value *> I expected to see something like ArrayRef<InstructionOrPseudoInstruction>, where InstructionOrPseudoInstruction is the class that represents the instruction/Value itself, or the pseudo instruction.

Again, you need all this code because you did not implemented what we discussed. Try to use the InstructionOrPseudoInstruction like class to represent values/instructions and pseudo-instructions. It should be much easier to implement and a lot of changes will just go away.