Scalar Target Information

Return true if the specified immediate is legal add immediate, that is the target has add instructions which can add a register with the immediate without having to materialize the immediate into a register. More...

Return true if the specified immediate is legal icmp immediate, that is the target has icmp instructions which can compare a register against the immediate without having to materialize the immediate into a register. More...

Compared to the SW implementation, HW support is supposed to significantly boost the performance when the population is dense, and it may or may not degrade performance if the population is sparse. A HW support is considered as "Fast" if it can outperform, or is on a par with, SW implementation when the population is sparse; otherwise, it is considered as "Slow".

Many APIs in this interface return a cost. This enum defines the fundamental values that should be used to interpret (and produce) those costs. The costs are returned as an int rather than a member of this enumeration because it is expected that the cost of one IR instruction may have a multiplicative factor to it or otherwise won't fit directly into the enum. Moreover, it is common to sum or average costs which works better as simple integral values. Thus this enum only provides constants. Also note that the returned costs are signed integers to make it natural to add, subtract, and test with zero (a common boundary condition). It is not expected that 2^32 is a realistic cost to be modeling at any point.

Note that these costs should usually reflect the intersection of code-size cost and execution cost. A free instruction is typically one that folds into another instruction. For example, reg-to-reg moves can often be skipped by renaming the registers in the CPU, but they still are encoded and thus wouldn't be considered 'free' here.

The cost of the address computation. For most targets this can be merged into the instruction indexing mode. Some targets might want to distinguish between address computation for memory operations on vector types and scalar types. Such targets should override this function. The 'SE' parameter holds pointer for the scalar evolution object which is used in order to get the Ptr step value in case of constant stride. The 'Ptr' parameter holds SCEV of the access pointer.

A higher cost indicates less expected throughput. From Agner Fog's guides, reciprocal throughput is "the average number of clock cycles per instruction when the instructions are not part of a limiting dependency chain." Therefore, costs should be scaled to account for multiple execution units on the target that can process this type of instruction. For example, if there are 5 scalar integer units and 2 vector integer units that can calculate an 'add' in a single cycle, this model should indicate that the cost of the vector add instruction is 2.5 times the cost of the scalar add instruction. Args is an optional argument which holds the instruction operands values so the TTI can analyze those values searching for special cases or optimizations based on those values.

This is the cost of reducing the vector value of type Ty to a scalar value using the operation denoted by Opcode. The form of the reduction can either be a pairwise reduction or a reduction that splits the vector at every reduction level.

The contract for this is the same as getOperationCost except that it supports an interface that provides extra information specific to call instructions.

This is the most basic query for estimating call cost: it only knows the function type and (potentially) the number of arguments at the call site. The latter is only interesting for varargs function types.

Note this is not necessarily the same as addrspace(0), which LLVM sometimes refers to as the generic address space. The flat address space is a generic address space that can be used access multiple segments of memory with different address spaces. Access of a memory location through a pointer with this address space is expected to be legal but slower compared to the same memory location accessed through a pointer with a different address space. This is for targets with different pointer representations which can be converted with the addrspacecast instruction. If a pointer is converted to this address space, optimizations should attempt to replace the access with the source address space.

Returns

~0u if the target does not have such a flat address space to optimize away.

The cost of Gather or Scatter operation Opcode - is a type of memory access Load or Store DataTy - a vector type of the data to be loaded or stored Ptr - pointer [or vector of pointers] - address[es] in memory VariableMask - true when the memory access is predicated with a mask that is not a compile-time constant Alignment - alignment of single element

The cost of the interleaved memory operation. Opcode is the memory operation code VecTy is the vector type of the interleaved access. Factor is the interleave factor Indices is the indices for interleaved load members (as interleaved load allows gaps) Alignment is the alignment of the memory operation AddressSpace is address space of the pointer.

Return the expected cost for the given integer when optimising for size.

This is different than the other integer immediate cost functions in that it is subtarget agnostic. This is useful when you e.g. target one ISA such as Aarch32 but smaller encodings could be possible with another such as Thumb. This return value is used as a penalty when the total costs for a constant is calculated (the bigger the cost, the more beneficial constant hoisting is).

Some HW prefetchers can handle accesses up to a certain constant stride. This is the minimum stride in bytes where it makes sense to start adding SW prefetches. The default is 1, i.e. prefetch with any stride.

Note that this is designed to work on an arbitrary synthetic opcode, and thus work for hypothetical queries before an instruction has even been formed. However, this does not work for GEPs, and must not be called for a GEP instruction. Instead, use the dedicated getGEPCost interface as analyzing a GEP's cost required more information.

Typically only the result type is required, and the operand type can be omitted. However, if the opcode is one of the cast instructions, the operand type is required.

The returned cost is defined in terms of TargetCostConstants, see its comments for a detailed explanation of the cost values.

A value which is the result of the given memory intrinsic. New instructions may be created to extract the result from the given intrinsic memory operation. Returns nullptr if the target cannot create a result from the given intrinsic.

True if the intrinsic is a supported memory intrinsic. Info will contain additional information - whether the intrinsic may write or read to memory, volatility and the pointer. Info is undefined if false is returned.

This can estimate the cost of either a ConstantExpr or Instruction when lowered. It has two primary advantages over the getOperationCost and getGEPCost above, and one significant disadvantage: it can only be used when the IR construct has already been formed.

The advantages are that it can inspect the SSA use graph to reason more accurately about the cost. For example, all-constant-GEPs can often be folded into a load or other instruction, but if they are used in some other context they may not be folded. This routine can distinguish such cases.

Operands is a list of operands which can be a result of transformations of the current operands. The number of the operands on the list must equal to the number of the current operands the IR user has. Their order on the list must be the same as the order of the current operands the IR user has.

The returned cost is defined in terms of TargetCostConstants, see its comments for a detailed explanation of the cost values.

Return true if the target has a unified operation to calculate division and remainder.

If so, the additional implicit multiplication and subtraction required to calculate a remainder from division are free. This can enable more aggressive transformations for division and remainder than would typically be allowed using throughput or size cost models.

Indicate that it is potentially unsafe to automatically vectorize floating-point operations because the semantics of vector and scalar floating-point semantics may differ.

For example, ARM NEON v7 SIMD math does not support IEEE-754 denormal numbers, while depending on the platform, scalar floating-point math does. This applies to floating-point math operations and calls, not memory operations, shuffles, or casts.

Return true if the specified immediate is legal add immediate, that is the target has add instructions which can add a register with the immediate without having to materialize the immediate into a register.

Return true if the addressing mode represented by AM is legal for this target, for a load/store of the specified type.

The type may be VoidTy, in which case only return true if the addressing mode is legal for a load/store of any legal type. If target returns true in LSRWithInstrQueries(), I may be valid. TODO: Handle pre/postinc as well.

Return true if the specified immediate is legal icmp immediate, that is the target has icmp instructions which can compare a register against the immediate without having to materialize the immediate into a register.

Test whether calls to a function lower to actual program function calls.

The idea is to test whether the program is likely to require a 'call' instruction or equivalent in order to call the given function.

FIXME: It's not clear that this is a good or useful query API. Client's should probably move to simpler cost metrics using the above. Alternatively, we could split the cost interface into distinct code-size and execution-speed costs. This would allow modelling the core of this query more accurately as a call is a single small instruction, but incurs significant execution cost.

This function provides the target-dependent information for the target-independent DivergenceAnalysis. DivergenceAnalysis first builds the dependency graph, and then runs the reachability algorithm starting with the sources of divergence.