Knowing foo returns a nonnull value cannot be used to annotate A or B.

Good catch, I'd missed that.

I think this is generally problematic as it basically special cases a situation which we can handle in general just fine.
Let's assume we inline and keep the nonnull on the return values around, we run nonnull deduction on the caller and we get the nonnull where it is correct.
The benefit of doing it this way (D75825) is that it will also work with all other attributes as well without reinventing all the logic in the Attributor.

I disagree, mostly in the framing of this as an either/or. If we can cheaply match IR patterns during inlining and use attributes instead of assumes, we should. The other option requires IR churn for minimal value since we'll fold the inserted assumes back into attributes in the end anyway. We clearly should use assumes for the general case, but that doesn't mean we shouldn't specialize the common case.

addressed review comments. This handles all attributes on the call and only done after the cloning of the inlined body.
Tests added for attributes and making sure the original callee does not have the attributes added within its body.

Knowing foo returns a nonnull value cannot be used to annotate A or B.

Good catch, I'd missed that.

Just noticed this. I think we can special case this to: the return and def call are in the same basic block (it will take care of the if-else in the example) and has only one use which is the return. The latter takes care of avoiding incorrect semantics in that call do_sth_with_ptr_and_may_use_nonnull or any other use that depends on knowing which of the ptr is non-null.

I think this is generally problematic as it basically special cases a situation which we can handle in general just fine.
Let's assume we inline and keep the nonnull on the return values around, we run nonnull deduction on the caller and we get the nonnull where it is correct.
The benefit of doing it this way (D75825) is that it will also work with all other attributes as well without reinventing all the logic in the Attributor.

I disagree, mostly in the framing of this as an either/or. If we can cheaply match IR patterns during inlining and use attributes instead of assumes, we should. The other option requires IR churn for minimal value since we'll fold the inserted assumes back into attributes in the end anyway. We clearly should use assumes for the general case, but that doesn't mean we shouldn't specialize the common case.

The above restrictions (which makes it conservative, but we have cases like that) should allow us cleanly place all the return attributes on the calls within the body, I think.

Knowing foo returns a nonnull value cannot be used to annotate A or B.

Good catch, I'd missed that.

Just noticed this. I think we can special case this to: the return and def call are in the same basic block (it will take care of the if-else in the example) and has only one use which is the return. The latter takes care of avoiding incorrect semantics in that call do_sth_with_ptr_and_may_use_nonnull or any other use that depends on knowing which of the ptr is non-null.

None of this is sufficient. We are repeating a lot of deduction logic here now...

AFAICT, this should be fine because the only operations in callee context is the load and return. We are not backward propagating something incorrect into the callee context. W.r.t the caller context, what was true before inlining, remains true after inlining as well.

AFAICT, this should be fine because the only operations in callee context is the load and return. We are not backward propagating something incorrect into the callee context. W.r.t the caller context, what was true before inlining, remains true after inlining as well.

If callee is internal and all call site are non-null we can easily make sure the Attributor catches this (if it does not already).

Anna, I'd encourage you to go very narrow here. We can resolve the correlated throw case with the following: require operand of return to be call instruction which is less than small constant window of non-trapping instructions before the return. (i.e. start with previous node) If we allow bitcasts, that provides reasonable coverage. We can always fallback to assumes as noted.

We don't need to be hugely general analysis wise to be very useful. Calls in tail positions or loads in analogous are very common. We should handle that obvious case.

Anna, I'd encourage you to go very narrow here. We can resolve the correlated throw case with the following: require operand of return to be call instruction which is less than small constant window of non-trapping instructions before the return. (i.e. start with previous node) If we allow bitcasts, that provides reasonable coverage. We can always fallback to assumes as noted.

We don't need to be hugely general analysis wise to be very useful. Calls in tail positions or loads in analogous are very common. We should handle that obvious case.

Philip, I think we need to be even more conservative. I don't see how this will handle the parameter of the call instruction being incorrectly optimized away (second example with "returned" attribute on the parameter). IIUC, these are the two restrictions we need:

no throwing instructions between the call (i.e. the operand of the return) and the returnInstruction - possible restrict to small constant window like you described.

the arguments for the call should feed directly from the arguments in the callee (we can have bitcasts/geps here - i.e. use stripAndAccumulateConstantOffsets). This avoids possible incorrect propagation to the argument of the call

talked with Philip offline. We just need the check for non-trapping instructions between the return value (i.e. the call) and the return instruction. In other cases, it is either correct to propagate the nonnull-ness or it showcased a UB that already existed in the program.

the test failures are related to attributes now being added to the calls in clang tests. All of these tests are using llvm intrinsics. Should we just disable this for intrinsic calls for now? Not sure how to update clang tests in the same patch as llvm project. Should be doable though.

the test failures are related to attributes now being added to the calls in clang tests. All of these tests are using llvm intrinsics. Should we just disable this for intrinsic calls for now? Not sure how to update clang tests in the same patch as llvm project. Should be doable though.

Er, git is monorepo. Those tests should be checked out alongside your LLVM copy. You just need to build clang. You can and should simply update the tests in the same patch.

the test failures are related to attributes now being added to the calls in clang tests. All of these tests are using llvm intrinsics. Should we just disable this for intrinsic calls for now? Not sure how to update clang tests in the same patch as llvm project. Should be doable though.

Er, git is monorepo. Those tests should be checked out alongside your LLVM copy. You just need to build clang. You can and should simply update the tests in the same patch.

fixed clang tests. rot-intrinsics.c testcase has 5 different RUNs with 3 prefixes. Depending on target-triple, the attribute is added to the caller, so I've disabled the optimization for that specific test with -update-return-attrs=false

the isGuaranteedToTransferExecutionToSuccessor check should be inverted

make_range should be until the return instruction - so we do not want std::prev on the returnInstruction. what's needed is: make_range(RVal->getIterator(), RInst->getIterator())

This means that from the callsite until (and excluding) the return instruction should be guaranteed to transfer execution to successor - only then we can backward propagate the attribute to that callsite.

1 the isGuaranteedToTransferExecutionToSuccessor check should be inverted

make_range should be until the return instruction - so we do not want std::prev on the returnInstruction. what's needed is: make_range(RVal->getIterator(), RInst->getIterator())

This means that from the callsite until (and excluding) the return instruction should be guaranteed to transfer execution to successor - only then we can backward propagate the attribute to that callsite.
Updated patch and added test cases.

as stated in a previous comment (https://reviews.llvm.org/D76140#1922292), adding Assumes here for simple cases seems like an overkill. It has significant IR churn and it also adds a use for something which can be easily inferred.
Consider optimizations that depend on facts such as hasOneUse or a limited number of uses. We will now be inhibiting those optimizations.

While i venomously disagree with the avoidance of the usage of one versatile interface
and hope things will change once there's more progress on attributor & assume bundles,
in this case, as it can be seen even from the signature of the isValidAssumeForContext() function,
it implies/forces nothing about using assumes, but only performs a validity checking,
similar to the MayContainThrowingOrExitingCall()

isValidAssumeForContext(Inv, CxtI, DT) does not force anything about assumes, but AFAICT all code which uses this function either has some sort of guard in the caller that the instruction is an assume. Also, the comments in the code state that it is for an assume. In fact, I believe if we intend to use that function more widely for other purposes, we should rename the function before using it (just a thought), and currently we should assert that Inv is an assume. It captures the intent of the function.

That being said, I checked the code in isValidAssumeForContext and it does not fit the bill here for multiple reasons. We either do:

isValidAssumeForContext(RVal /* Inv */, RInst /* CxtI */) which fails when we do not have DT and just return true when RVal comes before RInst - this is always the case, since RVal will come before RInst.

Callee and caller have the same attributes w/different values (i.e. deref)

And thinking through the code, I think there might be a bug here. It's not a serious one, but the if the callee specifies a larger deref than the caller, it looks like the the smaller value is being written over the larger.

Actually, digging through the attribute code, I think I'm wrong about the bug. However, you should definitely write the test to confirm and document merging behaviour!

If it does turn out I'm correct, I'm fine with this being addressed in a follow up patch provided that the test is added in this one and isn't a functional issue.

yes, thanks for pointing it out. I realized it after our offline discussion :)
For now, I will add a FIXME testcase which showcases the difference in code and handle that testcase in a followon change.

The key difference is when the caller directly returns the result vs uses it locally. The result here is that your transform is much more narrow in applicability than it first appears.

I tried multiple test cases to showcase the difference between the two ideas above but failed. Specifically, simplifyInstruction used during inlining the callee is not too great at optimizing the body. For example, see added testcase test7.

I also tried the less restrictive version (check the safety of the optimization in the callee itself, and do the attribute update on the cloned instruction), but didn't see any testcases in clang that needed update. Of course, that doesn't mean anything :)

I believe this just showcases undefined behaviour since we were having a returnValue (i.e. call) with an incomptable attribute compared to the return attribute on the callsite.

The last statement is not true. Had a discussion offline with Philip and he pointed out that we missed the fact that attributes such as signext and zeroext are part of the *call* itself. We cannot propagate these attributes into the callee since such attributes are part of the ABI for the call it is attached to.
I'm reopening this review to fix this issue.

Clarified this with Philip offline. The current patch is not restrictive. In fact, now that I think of it, sometimes, it may be better - simplifyInstruction can fold away instructions and reduce the "window size" between the RV and the ReturnInst.

p.s. Sorry for missing the functional issue the first time. All of the test changes should have made the issue obvious, but despite reading the LangRef description of signext, I somehow managed to miss the separation between ABI and optimization attributes.

p.s. Sorry for missing the functional issue the first time. All of the test changes should have made the issue obvious, but despite reading the LangRef description of signext, I somehow managed to miss the separation between ABI and optimization attributes.

thanks for the review Philip and pointing out the problem. All of us had missed the functional issue the first time around.