SROA executed before the method inlining replaces std::pair by i64 without splitting in both callee and func since at this point no access to the individual fields is seen to SROA.
After inlining, jump threading fails to identify that the incoming value is a constant due to additional instructions (like or, and, trunc).

This series of patch add two patterns in InstructionSimplify to fold extraction of members of std::pair. To help jump threading, actually we need to optimize the code sequence spanning multiple BBs.
These patches does not handle phi by itself, but these additional patterns help NewGVN pass, which calls instsimplify to check opportunities for simplifying instructions over phi, apply phi-of-ops optimization to result in successful jump threading.

Later, in the CFG simplification pass, the similar code modification happens. But it is too late to help jump threading.
This series of patches replaces my old patch D44626 which do similar optimization in InstCombine as suggested by reviewers.

This first patch in the series handles code sequences that merges two values using shl and or and then extracts one value using lshr.

All of the tests for instsimplify are already folded if you run -instcombine. Could the motivating problem also be considered a pass ordering bug?

If this is going to be an instsimplify patch, then I agree with the previous feedback: the code comments should use IR examples and explain what is happening to analyze/transform the IR instructions. C++ examples just confuse things.

I updated the test cases using update_test_checks.py and moved them into existing ll files since pair.ll is no longer good file name.

@spatel
The simplified test cases can be optimized by current instcombine pass. But to enable jump threading for std::pair, which is the original motivation of the patch, we must apply this folding over a phi node as discussed in https://reviews.llvm.org/D44626.
Executing jump threading pass again after CFG simplification may catch this opportunity. But I think it is better to do jump threading early to help other optimizers.

So i guess my question is, what in instcombine does this fold?
Is it very general, and this is only one of the cases it handles?
If not, maybe it should be refactored into instsimplify.

For the shl->or->lshr case, instcombine first identifies or is redundant and eliminates it. Then shlshr pair is eliminated. I think it is not general compared to my code.
For the shl->or->and case, instcombine does more general but more costly analysis in SimplifyDemandedInstructionBits. The same analysis seems too costly, but I expand the scope of my code for non-boolean cases.

Can you explain that in layman terms?
I don't see anything phi-related in instsimplify changes.

To help jump threading, actually we need to optimize the code sequence spanning multiple BBs. For an example of shl->or->and case looks like

Yeah, ok, i've convinced myself that this is ok. LGTM.
I do believe that the reasoning for this to be in instsimplify are sound (gvn needs it, running instcombine won't cut it.)

@inouehrs please commit testsnow (as of the current trunk, to not angry the bots).
And please wait a bit (2 days?) before committing the transform itself (and the test changes themselves) in case @spatel / others want to comment.

Yeah, ok, i've convinced myself that this is ok. LGTM.
I do believe that the reasoning for this to be in instsimplify are sound (gvn needs it, running instcombine won't cut it.)

@inouehrs please commit testsnow (as of the current trunk, to not angry the bots).

Yes, I'd also like to see the tests get committed now with baseline CHECKs.

We've made a good argument for having this in instsimplify, but I don't think we answered a related question: does adding either or both of these to instsimplify allow us to remove code from instcombine? Hopefully, there are existing regression tests for instcombine to verify its transforms - would those tests pass when this patch is applied?

This is 2 independent patches in 1 review (1 transform for 'lshr'; 1 transform for 'and'). It's best if we split them into separate patches.

At least some of the simplify tests do not appear to be minimized. I would expect that all tests end with 'lshr' or 'and' since that is where the pattern matching starts.

Why does the code use isa<ConstantInt> rather than match(op, m_APInt(C))? Using the matcher would give us vector splat functionality for free IIUC.

We've made a good argument for having this in instsimplify, but I don't think we answered a related question: does adding either or both of these to instsimplify allow us to remove code from instcombine?

I have investigated related instcombine patterns. But so far, I do not find something redundant with this instsimplify patch.
The pattern for 'and' is handled by SimplifyDemandedInstructionBits in instcombine and it has much wider coverage than this pattern.
The pattern for 'lshr' is handled by a combination of multiple patterns for or and shifts in instcombine. My instsimplify patch fully covers neither patterns in instcombine.

This is 2 independent patches in 1 review (1 transform for 'lshr'; 1 transform for 'and'). It's best if we split them into separate patches.

I will commit the transformation in lshr and and separately.

At least some of the simplify tests do not appear to be minimized. I would expect that all tests end with 'lshr' or 'and' since that is where the pattern matching starts.

I fixed the test. I intend to make the generated code simpler.

Why does the code use isa<ConstantInt> rather than match(op, m_APInt(C))? Using the matcher would give us vector splat functionality for free IIUC.

We've made a good argument for having this in instsimplify, but I don't think we answered a related question: does adding either or both of these to instsimplify allow us to remove code from instcombine?

I have investigated related instcombine patterns. But so far, I do not find something redundant with this instsimplify patch.
The pattern for 'and' is handled by SimplifyDemandedInstructionBits in instcombine and it has much wider coverage than this pattern.
The pattern for 'lshr' is handled by a combination of multiple patterns for or and shifts in instcombine. My instsimplify patch fully covers neither patterns in instcombine.

This is 2 independent patches in 1 review (1 transform for 'lshr'; 1 transform for 'and'). It's best if we split them into separate patches.

I will commit the transformation in lshr and and separately.

Commit - yes. But it wouldn't be bad to review (concurrently) these things as two differentials, too.

At least some of the simplify tests do not appear to be minimized. I would expect that all tests end with 'lshr' or 'and' since that is where the pattern matching starts.

I fixed the test. I intend to make the generated code simpler.

Why does the code use isa<ConstantInt> rather than match(op, m_APInt(C))? Using the matcher would give us vector splat functionality for free IIUC.

Fixed. Thank you for the suggestion.

I'm not sure of the exact problem (vector tests needed?), but added one nit.

Why does the code use isa<ConstantInt> rather than match(op, m_APInt(C))? Using the matcher would give us vector splat functionality for free IIUC.

Fixed. Thank you for the suggestion.

I'm not sure of the exact problem (vector tests needed?), but added one nit.

Yes - I think we should have at least 1 minimal test with vector types for each transform, so we know if that's working as expected. I think it will work, but I haven't stepped through to confirm that.

I'm not sure it matters right now.m_APInt() could potentially (not right now) match splat constant with undef's - <i32 42, i32 undef, i32 42>
But m_Specific() compares the pointers, not the underlying data. So if ShAmt0 and ShAmt1 are both splat,
but have different undefs (e.g. only one of them has undef elements), they would not have the same constant.

Theoretically, that code in my comment would still match this case.
But this does not matter right now since m_APInt() does not accept constants with undef elements.

Thanks for splitting it up. This is close to good IMO - just a few minor points:

Please add/adjust the tests with baseline checks as a preliminary step; we don't want to lose those in case the code change gets reverted.

Seeing this diff on its own makes it clear that we're overspecifying the more general SimplifyDemandedBits transform (what about the case where the shifts are in the opposite order?). That should be noted as a code comment and in the commit message.*

I don't expect any compile-time problems given that the computeKnownBits is buried under the other pattern checks, but be aware of that concern and watch for regressions.

It has been discussed before that SimplifyDemandedBits really shouldn't be included in InstCombine; it should be its own pass. If that structural change was made, would it make adjusting the optimization pipeline a more appealing solution than this?

Swap the 'or' operands here so we have coverage for the commuted case?
In general, I like to see a test comment that points that out too, so we know what's changing between the tests.
Also, it's a matter of taste, but the more common test format would put the test comment above the test definition, so it's clearly separated from the auto-generated CHECK lines.

Please add/adjust the tests with baseline checks as a preliminary step; we don't want to lose those in case the code change gets reverted.

I have updated baseline checks.

Seeing this diff on its own makes it clear that we're overspecifying the more general SimplifyDemandedBits transform (what about the case where the shifts are in the opposite order?). That should be noted as a code comment and in the commit message.*

I added comments. I will also mention in the commit message.

It has been discussed before that SimplifyDemandedBits really shouldn't be included in InstCombine; it should be its own pass. If that structural change was made, would it make adjusting the optimization pipeline a more appealing solution than this?

For the original motivating examples on jump threading, it needs inter-BB optimization (e.g. code below). So If SimplifyDemandedBits pass can support inter-BB opt as well as intra-BB opt and executed before the jump threading, it will be more general than this patch.