The current problem i am trying to resolve in somewhat analogous to hoisting 1/2 of the 64 bit add instruction pair. Although in this particular situation we are actually sinking 1/2 of the instruction pair into a later position within the same block. And yes, i can see how in the future a new machine instruction pass might choose to hoist one of the instructions into a pred BB. I realize i can write additional code to scan a previous block. However i think its better that passes not hoist part of an instruction pair, especially ones such as these. To that end i would rather see my patch assert so that we are forced to deal with such a situation should it arise.
Your example, btw, is a good one for why we should have an IR test for the current problem, rather than an MIR test. An MIR test that runs just before SILoadStoreOptimizer will not detect the affects of a new pass. Whereas the IR test attached to this patch stands a better chance of detecting the issue.

after looking at the suggestion of using computeRegisterLiveness, I noticed that it does not return the MI where the register in question is most recently defined.
Rather, it informs on liveness within a range. I dont really see how I would use this method effectively?
The problem I am trying to solve requires identifying a specific instruction that is needed by a subsequent instruction and then adding the identified instruction to a list constructed by SILoadStoreOptimizer.

correction: the original input test did NOT have the instructions separated by more than 1 or 2 instructions The resultant output showed the large separation.
The default neighborhood of 10 is probably more than enough.

test will convert to MIR form.
Patch will change to use computeRegisterLiveness. i will have to use a pretty large neighborhood , as the original code this error occurred in (before running bugpoint) , had the s_add_u32 instruction was separated by over 400 instructions from the s_addc_u32 instruction. We will assert fail if we cannot find the s_add_u32 instruction, so that will alert us to increase neighborhood size. This patch will also handle the corresponding sub instructions.

Dec 30 2018

generally seems fine to me.
Would it be reasonable/useful to have a lit test that somewhat represents what we observed in the DeviceMemory test ?
if you think the fdiv32-to-rcp-folding.ll adequately covers it, then thats fine by me.

Adding the shrink pass just before Peephole SDWA does not help the lit test, it made no difference.
I think at this point i should proceed with adding the MIR test to verify when we cannot fold.

Do you know why was it unable to shrink these instructions?

SIInstrInfo::splitScalar64BitAddSub converts the S_ADD_U64_PSEUDO into the two add instructions which use the SReg_64_XEXECRegClass instead of VCC.
Later when SIShrinkInstructions::runOnMachineFunction pass runs, it sees that the Carry regs are not VCC and simply marks them with a hint to later convert to VCC ,
and then continues without doing a transformation.

OK, that makes sense. It just leaves it to post-RA shrink that way. Probably shrink pass could do the same, but it is not desirable as it limits scheduling opportunities.

But I see the problem in your code now: you do not check that vcc is not clobbered or used in between of two instructions.
I also think you need to shrink both instructions, otherwise you have carry-in of addc and carry-out of add in different registers, which just happen to be allocated to the same vcc. Note, that isConvertibleToSDWA() returning true does not guarantee final sdwa conversion, so you can end up with vop3 form for the first instruction anyway.

i think the following code does make sure that there are no intervening uses, i can also strengthen it to make sure the defining instruction of the CarryIn is the first ADD instruction.
+ if (!MRI->hasOneUse(CarryIn->getReg()) || !MRI->use_empty(CarryOut->getReg()))
+ return false;

It does not. It only checks that original sreg is not used. However you are replacing original carry sreg with vcc by shrinking instruction, and you do not check vcc uses.

Adding the shrink pass just before Peephole SDWA does not help the lit test, it made no difference.
I think at this point i should proceed with adding the MIR test to verify when we cannot fold.

Do you know why was it unable to shrink these instructions?

SIInstrInfo::splitScalar64BitAddSub converts the S_ADD_U64_PSEUDO into the two add instructions which use the SReg_64_XEXECRegClass instead of VCC.
Later when SIShrinkInstructions::runOnMachineFunction pass runs, it sees that the Carry regs are not VCC and simply marks them with a hint to later convert to VCC ,
and then continues without doing a transformation.

OK, that makes sense. It just leaves it to post-RA shrink that way. Probably shrink pass could do the same, but it is not desirable as it limits scheduling opportunities.

But I see the problem in your code now: you do not check that vcc is not clobbered or used in between of two instructions.
I also think you need to shrink both instructions, otherwise you have carry-in of addc and carry-out of add in different registers, which just happen to be allocated to the same vcc. Note, that isConvertibleToSDWA() returning true does not guarantee final sdwa conversion, so you can end up with vop3 form for the first instruction anyway.

Which may be a good thing if these failures are progressions (as I suspect) and not regressions. Are they progressions?
That is the point of other comments too, this patch is limited to handle just two instructions while there is a clear possibility to do it for almost any VOP3.

I would also assume many of these failures are just commute which is attempted by shrink pass. That is normal and would only need to change the tests.

Which may be a good thing if these failures are progressions (as I suspect) and not regressions. Are they progressions?
That is the point of other comments too, this patch is limited to handle just two instructions while there is a clear possibility to do it for almost any VOP3.

I would also assume many of these failures are just commute which is attempted by shrink pass. That is normal and would only need to change the tests.

Matt, thanks for review, I have addressed most of the issues although the inreg/sgpr one may require some more investigation on my end.
see my responses to each of your review comments.
A revision to this patch will be uploaded in the next hour or so...