I'm not sure I see why LowerSwitch needs to worry about this optimization. Why doesn't SimplifyCFG or DCE or one of some other control flow optimization pass handle this so LowerSwitch doesn't have to worry about it?

Feb 5 2019

The type used in this case for the result has more to do with the type required for the operations after required for the legalization. The G_SUB can be done in either width, but the narrower sub seems like a more preferable canonical form.

Jan 17 2019

Can we run EarlyCSE after CGP (instead of trying to do these kinds of point fixes)?

Hi Hal,

Thank you for looking into this!

That's a great suggestion! Originally I just eyeballed it as too expensive compile-time wise for the little effect that is achieved here.

This time around with you pointing it out, I performed the actual measurements for a very large suite of shaders (the downstream target in question is a GPU). I see about 1.0 (+/-0.3)% increase in overall compile time (the part of it that happens at an application run-time) on average. The generated code quality improvement that comes out of this may be important, but it doesn't trigger often enough to justify 1% compile time increase across the board.

Oct 31 2018

Do you think it's better to remove the NFC tag from the patch? It doesn't look like it's completely NFC, though, I've tested this out for a major (though, out of tree) GPU target on a very large suite of shaders and found no difference.

Aug 29 2018

I'm not against a separate IR canonicalization stage, even though it's a bit overkill for this particular case. At the moment it looks like InstCombine is doing the canonicalizaton for this case, so re-running that is out of the question. A new pass looks to be on the cards. Anyone else have opinions on this?

Does that mean that something after the last InstCombine produces icmp's with the non-canonical operand order? If so maybe that something needs to be chased down and fixed instead.

I'm not against a separate IR canonicalization stage, even though it's a bit overkill for this particular case. At the moment it looks like InstCombine is doing the canonicalizaton for this case, so re-running that is out of the question. A new pass looks to be on the cards. Anyone else have opinions on this?

Renamed MaxRecurse to Depth as requested and updated the logic around it as MaxRecurse was going from the positive limit down to 0 which makes no sense for a parameter called Depth, so Depth is going from 0 to MaxDepth now.

Renamed RecursionLimit to MaxDepth to make the connection with Depth parameter clearer (which was MaxRecurse previously).

I think if we're going to do this, we need to implement it on top of a SCEV-based known-bits implementation; introducing a separate getZeroExtendExprForValue API is going to lead to weird results if SCEV creates a zero-extend expression for some other reason.

Whether we should do this in general, I'm not really sure. I mean, yes, I can see how this particular form is a bit more convenient for the load-store vectorizer, but it doesn't seem very general; it seems more intuitive to canonicalize towards reducing the number of AddExprs. But maybe pulling as much information as possible outside of the zext is generally useful enough to make this worthwhile?

Do you also plan to implement a similar transform for AddRecs? (e.g. (zext i32 {1,+,2}<%while.body> to i64)).