Recent Activity

Yesterday

I tried reverting my patch for the DAG type legalizer, and found that vec-trunc-to-i1.ll still fails with this new patch applied. It fails during type legalization, while this patch at least in this case gets enabled only after type legalization. So I think the test still serves its purpose.

This looks like a promising direction. I particularly like the idea of having a way to intersect information from the backend instruction definitions with the constraints coming from the IR. However, I also have some concerns.

Sat, Dec 15

Well, mayRaise Exception is purely a MI level flag. I struggle to see where optimizations on the MI level would ever care about rounding modes in the sense you describe: note that currently, MI optimizations don't even know which operation an MI instruction performs -- if you don't even know whether you're dealing with addition or subtraction, why would you care which rounding mode the operation is performed in? MI transformations instead care about what I'd call "structural" properties of the operation: what are the operands, what is input vs. output, which memory may be accessed, which special registers may involved, which other side effects may the operation have. This is the type of knowledge you need for the types of transformations that are done on the MI level: mostly about moving instructions around, arriving at an optimal schedule, de-duplicating identical operations performed multiple times etc. (Even things like simply changing a register operand to a memory operand for the same operation cannot be done solely by common MI optimizations but require per-target support.)

Fri, Dec 14

This patch does seem FP exception centric and rounding mode agnostic though. Should FPExcept and friends be named something more general to cover both? To be clear, I'm okay with the current naming scheme, so just playing Devil's advocate.

Mon, Dec 10

Tue, Dec 4

The one problem with your "new" code is that it now forces back-ends to implement something, or else code involving constrained intrinsics will trigger internal compiler errors. It might be preferable to avoid those ...

GCC has never supported the __float128 type on SystemZ, because "long double" is already IEEE-128 on the platform. GCC only supports a separate __float128 type on platforms where "long double" is some other type (like x86 or ppc64).

Digressing a bit, but has anyone given thought to how this implementation will play out with libraries? When running with traps enabled, libraries must be compiled for trap-safety. E.g. optimizations on a lib's code could introduce a NaN or cause a trap that does not exist in the code itself.

Might wait a little before you land this to see if @uweigand got anything more to say. But afaict this only removes the warning without affecting the result in any way, so it should not make anything worse.

If so, maybe we can fix this by swapping around the low and high subregs of all other register definitions, so that then subreg_h32 *always* maps to subreg_h32(subreg_h64)), and instead of explicit subreg_hh32 and subreg_hl32 we have rather subreg_lh32 and subreg_ll32 ?

But given that there is still infrastructure missing in the IR optimizers, I also think that at least in the first implementation, we probably should go with the original approach and just use constrained intrinsics everywhere in the function, and possibly add some function attribute that prevent any cross-inlining of functions built with constrained intrinsics with functions built with regular floating-point operations.

Subtle. This last sentence seems to imply that cross-inlining should be allowed when there are no regular floating point operations in the function to be inlined. This makes sense due to, for example, the common use of tiny functions just to retrieve a value. Do I interpret your statement correctly?

Tue, Nov 20

I agree that it's preferable to re-use these existing options if possible. I have some concerns that -ftrapping-math has a partial implementation in place that doesn't seem to be well aligned with the way fast-math flags are handled, so it might require some work to have that working as expected without breaking existing users. In general though these seem like they should do what we need.

Mon, Nov 19

As I understand the approach, it can never introduce a new trap. Now of course, if the original source value is outside the range of the target integer type, some of the instructions introduced by this approach may trap, but in that case, it is expected for the fp_to_uint operation to trap somewhere.

As a future enhancement, we might prefer to choose the initial value to replicate such that the number of VLVGx instruction is minimized. E.g. if we load two integers LA and LB from memory and construct the vector { LA, LB, LB, LB }, the current code, even with your change, would load and replicate LA, and then use three VLVGx to insert the LB copies. We could save two instructions by instead using load-and-replicate for LB.

Oct 15 2018

The first one seems to indicate that your libclang.so is broken in release mode (optimization error?). The second one is correct (some of the tests test for errors, and apparently don't silence the messages).

How is this going to work? I don't think this can work unless we can start guaranteeing that MMOs are never dropped. Most FP instructions aren't considered as mayLoad or mayStore, and will probably easily lose the MMO if it's the only thing indicating they care about the FP mode

Aug 17 2018

I believe we still don't need an intrinsic for constrained fneg. As you say, floating-point negation is implemented as -0-x, using a regular fsub (*not* the constrained intrinsic). The fsub semantics says that it cannot trap, so this would still be fine -- as long as we're sure the -0-x is always implemented via a dedicated negate instruction, and never via a subtraction. But this seems to be true, the idiom -0-x is recognized early during codegen and always treated specially.

Aug 15 2018

Ah, that's good to know! So if I understand this correctly, accessing even the 32-bit part (F0S) would be considered to clobber F0D. This is not really necessary, but probably doesn't hurt at this point. We could do the same thing as the HAX you mention to get this modeled exactly.