...this is a step in cleaning up our fast-math-flags implementation in IR.

As proposed in the above threads, I've replaced the 'UnsafeAlgebra' bit (which had the 'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic reassociation - 'AllowReassoc'.

I've also added a bit to allow relaxed precision for transcendental functions called 'AllowTrans' (this was initially proposed as 'libm' or similar).

...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), but that's apparently already used for other purposes. Also, I don't think we can just add a field to FPMathOperator because Operator is not intended to be instantiated. We'll defer movement of FMF to another day.

I'm not sure, but I may be diverging from the last proposal by keeping the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract trans float %op1, %op2
...made me think we want to keep the shortcut synonym.

I've also gone ahead and renamed the getter/setters, and mechanically added 'TODO' comments where we need to review how the old setUnsafeAlgebra() was used. In some cases, it's obvious that should be translated to setAllowReassoc(), but others may need to be discussed.

Finally, this change is binary incompatible with existing IR as seen in the compatibility tests. I'm hoping this:
"Newer releases can ignore features from older releases, but they cannot miscompile them. For example, if nsw is ever replaced with something else, dropping it would be a valid way to upgrade the IR." ( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR version. Ie, I don't think we're ever loosening the FP strictness of existing IR. At worst, we will fail to optimize some previously 'fast' code because it's no longer recognized as 'fast'. This should get fixed as we squash all of the TODO comments.

Could you mark up the LLVM intrinsics affected by the "trans" flag with the meaning with/without the flag?

Please clarify the language here to indicate this affects the semantics of both LLVM intrinsics and known library functions. And include a more complete description of what counts as a known library function. And explain what "relaxed" means, given that libm generally doesn't provide correctly rounded versions of transcendental functions.

Also, do we want to optimize sqrt() based on this flag? It technically isn't transcendental, but we currently generate an approximation in some cases based on fast-math flags.

I think we'd include sqrt() in the 'trans' bucket (so maybe 'libm' was the better name). But looking back through the dev thread, I don't see an actual definition of that term or what this flag would map to as a clang command-line param. @hfinkel / @wristow - suggestions?

D39319 (Set contract flag when setting unsafe algebra flag) proposes to fix a bug introduced with D31164 (Add AllowContract to FastMathFlags). That bug is also fixed here - but I should add that test.

This might suggest that we should split 'trans' into its own small follow-up patch. That would also allow the unsafe TODO fixes to proceed independently of 'trans' in case that takes longer to sort out. Let me know if splitting this patch up seems better.

before every call to isFast. I can grep for isFast more easily than I can grep for the comment. Moreover, if something really needs 'isFast' because the transformation really needs "every possible liberty", then there should definitely be a comment explaining that.

Also, unless we make hasAllowReassoc() also imply hasNoSignedZeros() (as GCC does, FWIW), we'll almost certainly need that check as well in many places. Many need NoNans and NoInfs too.

You're correct, sqrt is not a transcendental function (it is an algebraic function because it is the root of a polynomial equation). The problem is that algebraic functions also include things that we don't want to include here (e.g., addition, division). I don't like 'libm' because it refers to a very specific set of functions, but maybe that's the best we can do. The best description I have of this set is: "All of the mathematical operations that generally produce irrational numbers and are not in the set of functions specified by the IEEE specification (e.g., +,-,*,/,%,sqrt)."

For context, the original suggestion of the flag-name 'libm' was changed to 'trans' based on an observation that OpenCL has an option '-cl-fast-relaxed-math' that includes the semantics "This option also relaxes the precision of commonly used math functions" (see http://man.opencl.org/clCompileProgram.html), and so 'libm' may incorrectly imply it only applies to a more limited set of functions (math operations) only implemented in "libm.a". That led to the suggestion of 'trans', for transcendental functions. I originally liked the 'trans' suggestion, moving away from the libm.a implication. But I do feel that sqrt() should also be optimized on this flag, so as people said, 'trans' isn't perfect either.

I don't have a precise formal definition of what things would be controlled by this, but loosely, I'd say it's "All mathematical operations that have runtime library support on many platforms." In practice in LLVM, I think this means many (most?) of the math operations handled by "SimplifyLibCalls.cpp". So for me, in addition to transcendental functions, it would include sqrt(), and even simpler things like fmin() and fmax().

So after giving it more thought, I'm now back to preferring 'libm' over 'trans'. If someone has a better suggestion, that would be great. But I haven't thought of one.

...
This might suggest that we should split 'trans' into its own small follow-up patch. That would also allow the unsafe TODO fixes to proceed independently of 'trans' in case that takes longer to sort out. Let me know if splitting this patch up seems better.

I'd like to see the new enumerations of the FastMathFlags bits all defined in one patch. So removing 'UnsafeAlgebra', and adding 'AllowReassoc' and 'AllowTrans' (along with all the changes required since 'UnsafeAlgebra' can no longer be referenced) in the first patch, is my preference. (As an aside, probably 'AllowTrans' will be changed to something like 'AllowMathLib', due to other discussions here.)

...
This might suggest that we should split 'trans' into its own small follow-up patch. That would also allow the unsafe TODO fixes to proceed independently of 'trans' in case that takes longer to sort out. Let me know if splitting this patch up seems better.

I'd like to see the new enumerations of the FastMathFlags bits all defined in one patch. So removing 'UnsafeAlgebra', and adding 'AllowReassoc' and 'AllowTrans' (along with all the changes required since 'UnsafeAlgebra' can no longer be referenced) in the first patch, is my preference. (As an aside, probably 'AllowTrans' will be changed to something like 'AllowMathLib', due to other discussions here.)

So after giving it more thought, I'm now back to preferring 'libm' over 'trans'. If someone has a better suggestion, that would be great. But I haven't thought of one.

In the spirit of the flag 'arcp' for 'AllowReciprocal', and the possibility of 'AllowMathLib' for the internal enumeration name, how about 'amlib'?

With that, there's no direct implication of "this is only for libm.a operations", or "this is only for transcendental functions". That said, the 'am' part does give it an awkward "morning library" feeling. Maybe 'amathlib' instead?

It seems that we're allowing something kind of open-ended here. That is, we don't seem to have an exact set of functions that will be covered. If that's the case then we should probably document it as such -- something like "allow substitution of approximate calculations for functions whose meaning are recognized by the optimizer." And maybe the flag could be "approx".

Yes, I think it's open-ended. And I very much like the description "allow substitution of approximate calculations for functions whose meaning are recognized by the optimizer.".

I'm less enthusiastic about the flag-name 'approx', although I'm not horribly opposed to it (especially since I cannot come up with anything I really like). On it's own, a flag named 'approx' sounds too wide of scope. To me, it sounds like it might be describing all of what is enabled by -ffast-math. In short, it doesn't explicitly convey the concept of it being approximate calculations for functions whose meanings are recognized. To describe my concern "from the other direction", the reciprocal transformation is also an approximation (as is virtually everything else enabled by fast-math), and we don't intend to control the reciprocal transformation via this flag.

Sadly, I didn't get email notifications for the last couple of name suggestions before I posted the updated patch. I agree that 'ApproxFunc' and 'afn' are better than 'AllowMathLib' and 'aml', so I'll change that unless there are objections or better suggestions.

This conformance language is only necessary for sqrt. For the other functions, there is no standard for their accuracy/precision. You might say that with 'afn' the result may not match what would have been returned by the system's libm implementation.

I don't expect that afn would affect fma. I recommend removing the statement here. It might be true that fma is computed differently under -ffast-math because of denormal handling, but that applies to everything. The same is true of the conversion functions below (floor, rint, etc.). maxnum/minnum too.

This language might give the wrong idea. Even without 'afn', the result may not match the target's libm: we constant-fold using a different implementation. The part to call out is that we might substitute an implementation which is less accurate.

True. On the other hand, the system's libm sqrt should be IEEE compliant, so saying that this differs from the libm result covers that (and also covers other cases, such as PPC long double, which aren't IEEE).

One loose end that needs to be taken care of more or less simultaneously is a Clang change. Specifically, the constructor for CodeGenFunction (in "CodeGenFunction.cpp") invokes FastMathFlags::setUnsafeAlgebra(), so it will need to be changed to setFast().

This is correct - I didn't post it, but I have that one line patch in place locally, so I was planning to submit it to the clang repo as close as possible after this patch and reference this commit (if there's a way to avoid the build breakage cleanly, please let me know).

We'll need to add flags to SDNodeFlags that are analogous to AllowReassoc and ApproxFunc. Adding them in a separate patch seems fine, but in case the lack of that change in this patch was an oversight, I wanted to raise the point here.

I'll fix the comment to match the current code. Since this is the reassociation pass, I would guess that 'reassoc' is all we need to enable transforms here, but we'll have to verify that that is correct.

It is surprising but intentional. I want to highlight the fact that we allow any FMF on any FPMathOperator. So things like this or 'fadd arcp ...' are legal but I'm not sure how that would be used for optimization. We may want to refine that someday?

To be clear, I'm not suggesting that in this patch we change the code here to check for 'reassoc' (i.e., I'm not suggesting we change the isFast() call to hasAllowReassoc() at this time). I view that as a separate piece of work, where we go through and carefully audit existing FMF-related checks, and decide how to use the more precise flags. Possibly it's just 'reassoc' that is needed for this case, or possibly it's 'reassoc' and some other conditions.

All I was suggesting by my comment/question, is that code-comments referring to a no longer-existing "unsafe algebra" umbrella flag, are a bit misleading. So I wondered whether we wanted to change those comments to better match the new implementation. It's a pretty minor point, in my view.

From my POV, the main purpose of this patch is to fix the underlying implementation to allow us to go through and do that audit, and fix issues like this "use isFast() or use some finer check?" example, here.

Patch updated:
Change code comments that reference 'unsafe' to the new 'fast' vocabulary. This is not condoning what may be a wrongful use of 'fast'; it's just trying to keep our code and comments in sync. I didn't go out of my way to look beyond a few lines in the diffs, so there may still be 'unsafe' refs and wrapper functions out there, but we'll squash those as we audit the usage of 'isFast'.

As an llvm vendor for Apple targets we have actually run into the incompatibility issues with old fast and current fast, we elected to use version markers in the IR for in house code to tell us that the old fast is relevant in the bit code reader during auto upgrade for IR into new Fast format.

As an llvm vendor for Apple targets we have actually run into the incompatibility issues with old fast and current fast, we elected to use version markers in the IR for in house code to tell us that the old fast is relevant in the bit code reader during auto upgrade for IR into new Fast format.

Thanks for letting me know. Did this manifest as a performance problem only or was there a visible functional difference too?

Did this manifest as a performance problem only or was there a visible functional difference too?

I guess it depends how you qualify visible functional difference :).
With the previous layout of the bits, reassociation was performed, but with the new layout, it is not.
The reason behind that reassociation checks isFast, which with the new layout is true only if all the 7 bits are set to true and that doesn't happen when the upgrade path is used (we get only 5 of them).

So this is a performance problem that is exposed by a functional difference in the compiler (an optimization was triggered and now it is not).

Given this new format was not released yet, do you see a way we could make the autoupgrade path to preserve the isFast semantic from the old format to the new format?

I haven't audited the code, but potential we are going to regress all the code that relied on isFast and given this is not publicly available yet (unless I am mistaken), I was wondering if there is a way to fix that.

Did this manifest as a performance problem only or was there a visible functional difference too?

I guess it depends how you qualify visible functional difference :).
With the previous layout of the bits, reassociation was performed, but with the new layout, it is not.
The reason behind that reassociation checks isFast, which with the new layout is true only if all the 7 bits are set to true and that doesn't happen when the upgrade path is used (we get only 5 of them).

So this is a performance problem that is exposed by a functional difference in the compiler (an optimization was triggered and now it is not).

Right - we expected that could happen with existing IR compiled with -ffast-math. I don't think we could go wrong in that scenario given that 'fast' gives us license to do all kinds of transforms, but doesn't require it - although people have different expectations once they get accustomed to the optimizations. :)

Given this new format was not released yet, do you see a way we could make the autoupgrade path to preserve the isFast semantic from the old format to the new format?
I haven't audited the code, but potential we are going to regress all the code that relied on isFast and given this is not publicly available yet (unless I am mistaken), I was wondering if there is a way to fix that.

Michael has this patch already, right? I think we have to create a new version of the IR since the bits changed meaning (we can't just flip 'on' new bits).

I have no objection to that, but that use case wasn't important to me, so that's why I didn't bother in this patch. AFAIK, this is new for v6.0, so yes, it's still possible to do this before that window closes.

I think we have to create a new version of the IR since the bits changed meaning (we can't just flip 'on' new bits).

Yeah, exactly.

I have no objection to that, but that use case wasn't important to me, so that's why I didn't bother in this patch. AFAIK, this is new for v6.0, so yes, it's still possible to do this before that window closes.

That would be ideal, but on the other hand, like you said, the new code won't be wrong.
I don't know what it takes to bump the bitcode version nor what are the implications, so maybe that's the right call, but let us have this conversation on LLVM dev.

BTW, while writing the RFC, I realized that we could potentially generated incorrect code if we were silently downgrade a post-r317488 bitcode with a pre-r317488 compiler. (I.e., running fast math optimizations whereas we only wanted reassoc)