You can choose a password length of not more than 50 characters. Do not forget to switch keyboard layout to the English. Do not choose a password too simple, less then 4 characters, because such a password is easy to find out. Allowed latin and [email protected]#$%^&*()_-+=., characters

Why can LLVM call a function that is never called?

<i> Whatever your dragon told you, he lied. Dragons are deceitful. You do not know what awaits you on the other side. <Tgsri>
Michael Swanvik. "The daughter of an iron dragon"

Not so long ago on Habr a post was published under the name "<a rel="nofollow" href="https://habrahabr.ru/company/infopulse/blog/338812/"> How can the function never called be called? <Tgsrcut>". The conclusion from the article is simple: in the case of undefined behaviour, the compiler has the right to take any actions, even if they are completely unexpected. However, I was interested in the very mechanism of this optimization. The result of my small research I want to share with the distinguished community of the hubra.

Let me remind you briefly what the essence was. In the source code below, the EraseAll function should not be called from main, and it is not really called when compiled with -O0, but is suddenly called with optimization -O1 and higher.

#include <cstdlib>

typedef int (*Function)();

static Function Do;

static int EraseAll() {
return system("rm -rf /");
}

void NeverCalled() {
Do = EraseAll;
}

int main() {
return Do();
}
This is explained as follows: in the given code the variable Do is a pointer to a function, and it has the initial value null. When we try to call a function on the null pointer, the behavior of the program can be undefined (undefined behaviour, UB) and the compiler has the right to optimize the UB as it is more convenient. In this case, the compiler immediately executed the Do = EraseAll assignment.

Why he did it, we'll try to figure it out now. In the rest of the text, LLVM and Clang version 5.0.0 are used as a compiler.

To begin with, let's look at the IR-code when optimizing with -O0 and -O1. Let's change the source code slightly to make it less dramatic:

#include <stdio.h>

typedef int (*Function)();

static Function Do;

static int PrintHello() {
return printf("hello world\n");
}

void NeverCalled() {
Do = PrintHello;
}

int main() {
return Do();
}
And we compile the IR-code at -O0 (the debugging information is omitted for clarity):

declare i32 @printf(i8*, ...) local_unnamed_addr #1
You can compile the executables and make sure that in the first case, a segmentation error occurs, and in the second case, "hello world" is displayed. With other optimization options, the result is the same as for -O1.

Now find the part of the compiler code that performs this optimization. I remind you that in the architecture of LLVM, the frontend does not optimize itself, i.e. cfe (Clang Frontend) always generates the code without optimizations, which we see in the version for -O0, and optimizes the utility opt:

// TODO: Move all global ctors functions to the end of the module for code
// layout.

return Changed;
}
Let's try to describe in words what this function does. For each global variable in the module, it requests a Comdat object. What is Comdat The Comdat section is a section in an object file that contains objects that can be duplicated in other object files. Each object has information for the linker, indicating what it should do when duplicates are detected. The options can be any: Any - anything, ExactMatch - duplicates must completely match, otherwise an error occurs, Largest - take the object with the largest value, NoDublicates - there should not be a duplicate, SameSize - duplicates must have the same size, otherwise an error occurs.

In LLVM, Comdat data is represented by an enumeration:

enum SelectionKind {
Any, ///< The linker may choose any COMDAT.
ExactMatch, ///< The data referenced by the COMDAT must be the same.
Largest, ///< The linker will choose the largest COMDAT.
NoDuplicates, ///< No other Module may specify this COMDAT.
SameSize, ///< The data referenced by the COMDAT must be the same size.
};
, and the class Comdat actually represents a pair (Name, SelectionKind). ( Actually, everything is more complex. ) All variables that for some reason can not be deleted are placed in a set of NotDiscardableComdats. With functions and global aliases, we do the same - something that can not be deleted is placed in NotDiscardableComdats. Then, separate optimization functions for global constructors, global functions, global variables, global aliases, and global destructors are called. Optimizations continue in the cycle until no optimization is performed. At each iteration of the loop, the set of NotDiscardableComdats is set to zero.

Let's look at the example of this optimization pass and consider how this result turned out. Of course, to parse all the optimization options even in one pass is a very voluminous task. it provides for many different special cases of optimizations. We will concentrate on our example, simultaneously considering those functions and data structures that are important for understanding the work of this optimization pass.

Initially, the optimizer does various uninteresting checks in this case, and calls the processInternalGlobal function, which tries to optimize global variables. This function is also quite complex and does a lot of different things, but we are interested in one thing:

if (GS.StoredType == GlobalStatus::StoredOnce && GS.StoredOnceValue) {
...
// We try to optimize global variables, about which it is known that they are assigned
// value only once, not including the initializing value.
if (optimizeOnceStoredGlobal(GV, GS.StoredOnceValue, GS.Ordering, DL, TLI))
return true;
...
}
The information that the global variable is assigned the value one and only once is extracted from the GS structure (GlobalStatus). This structure is populated in the calling function:

if (GlobalStatus::analyzeGlobal(&GV, GS))
return false;
...
Here we see one more interesting fact: objects whose names begin with "llvm." Are not subject to optimization (since they are system calls for llvm runtime). And, just in case, the names of variables in LLVM IR can contain points (and even consist of one point with the prefix @ or%). The function analyzeGlobal is a call to the LLVM API and we will not consider its internal work. The structure of GlobalStatus should be considered in more detail, since it contains very important information for optimization passes.

/// When we analyze a global variable, we store some information about it. If we
/// found that the address of the variable was being taken, no information from this structure
/// will not be valid
struct GlobalStatus {
/// True if the address of the global variable is used in comparison operations
bool IsCompared = false;

/// True if the global variable is assigned a value. If not, she
/// can be deleted
bool IsLoaded = false;

/// How is the assignment of the value
enum StoredType {
/// No assignment of value. A variable can be marked as a constant
NotStored,

/// Assignment occurs, but the same constant is assigned to which the variable
/// was initialized. This is only tracked for scalar types.
InitializerStored,

/// Assignment occurs, but only during initialization and one more time
/// with a different value. If the variable isStoredOnce, we write a value,
/// which was assigned in the StoredOnceValue field below. This is only checked for scalar
/// variables.
StoredOnce,

/// Assignment occurs many times or in such a way that
/// we can not track.
Stored
} StoredType = NotStored;

/// If only one value (except the initialization constant) is written
/// in this global variable, save the value here.
Value *StoredOnceValue = nullptr;
...
};
It is probably necessary to explain why "if we found that the address of the variable is being taken, no information from this structure will be reliable." In fact, if we take the address of a global variable, and then write something at this address, and not by name, then tracking it will be extremely difficult, and it is better to leave such variables as is, without trying to optimize.

So, we get into the function optimizeOnceStoredGlobal, in the arguments of which the variable (GV) and the stored value (StoredOnceVal) are transferred. Here they are:

if (GV->getInitializer()->getType()->isPointerTy() &&
GV->getInitializer()->isNullValue()) {
...
that is, the variable must be initialized with a null pointer. If this is the case, then we create a new SOVC variable corresponding to the StoredOnceVal value given to the GV type:

if (Constant *SOVC = dyn_cast<Constant>(StoredOnceVal)) {
if (GV->getInitializer()->getType() != SOVC->getType())
SOVC = ConstantExpr::getBitCast(SOVC, GV->getInitializer()->getType());
Here, getBitCast is the method that returns the bitcast command, which types the types in the LLVM IR language.
After that, the function OptimizeAwayTrappingUsesOfLoads is called. It transfers the global variable GV and the constant LV.
Direct optimization is performed by the function OptimizeAwayTrappingUsesOfValue (Value * V, Constant * NewV).
For each use of a variable:

for (auto UI = V->user_begin(), E = V->user_end(); UI != E; ) {
Instruction *I = cast<Instruction>(*UI++);
if this is a Load command, replace its operand with a new value:

if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
LI->setOperand(0, NewV);
Changed = true;
}
If the variable is used in the function call or invoke (which is exactly what happens in our example), create a new function, replacing its argument with a new value:

if (isa<CallInst>(I) || isa<InvokeInst>(I)) {
CallSite CS(I);
if (CS.getCalledValue() == V) {
// Calling through the pointer! Turn into a direct call, but be careful
// that the pointer is not also being passed as an argument.
CS.setCalledFunction(NewV);
Changed = true;
bool PassedAsArg = false;
for (unsigned i = 0, e = CS.arg_size(); i != e; ++i)
if (CS.getArgument(i) == V) {
PassedAsArg = true;
CS.setArgument(i, NewV);
}
All other arguments to the function are simply copied.
Also, similar replacement algorithms are provided for the Cast and GEP instructions, but in our case this does not happen.

The further actions are as follows: we look through all uses of a global variable, trying to delete everything, except assignment of value. If this is successful, then we can delete the Do variable.

So, we briefly reviewed the work of the optimization pass LLVM on a specific example. In principle, nothing super complicated is here, but more careful programming is required to provide for all possible combinations of commands and variable types. Of course, all this must be covered by tests. Learning the source code of LLVM optimizers will help you write your optimizations, allowing you to improve the code for some specific cases.

Examples of some interesting optimizations in LLVM

Here are a few more examples of how LLVM can optimize code. These examples are not relevant to the example that we just dismantled, and are done in other optimization passages, nevertheless, they are rather unusual and interesting.

for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
%s.0.lcssa = phi i32 [ 0, %entry ], [ %8, %for.cond.cleanup.loopexit ]
ret i32 %s.0.lcssa
}
Suddenly, there is no cycle, but miraculous variables like i33 (that is, a 33-bit integer) have appeared. How did it happen? LLVM converted the sum of the series into the formula: (n-1) * (n-2) / 2 + n - 1. So, as in the calculation of intermediate variables, a 32-bit grid overflow may occur, LLVM inserted a variable of type i33. Note that he did this by analyzing the unoptimized assembler code, which is rather difficult. The spoiler lists the unoptimized code for the same function that is directly considered in the loop:

for.end: ; preds = %for.cond
%5 = load i32, i32* %s, align 4
ret i32 %5
}
It's even more interesting to see what happens with this example in the backend. The variable i33 is converted to i64, and if your processor is 32-bit, command sequences are generated to multiply and add 64-bit numbers in a 32-bit system. Even more interesting, if in the original example, change the data type to long. Then the argument and the return value will be of type i64, and the intermediate variables will become type i65!

The second example is

Suppose we decided to write a function that changes the sign of float, changing the 31st bit of the binary representation of float:

invert: // @invert
// BB#0: // %entry
fneg s0, s0
ret
LLVM recognized the change in the 31st bit to the fneg command, which changes the float character. For comparison, GCC does not know how to do this, issuing a "literal" version.
GCC 6.3 (ARM 64):

invert(float):
fmov w0, s0
eor w0, w0, -2147483648
fmov s0, w0
ret
This is an example of target-specific optimization, which is done in the backend, rather than in the utility opt.

About this example, I need to say a few more words. These pointers violate the strict aliasing rule, which can lead to undefined behavior when compiled with the -strict-aliasing flag on some compilers and on some platforms (in fact, in a vanishingly small number of cases). For this example, the error occurs when compiling with gcc4.4.7 -m32 -O2, and disappears in the more recent versions of gcc. However, I put in the link list at the end a link to an interesting lecture on aliasing.

The third example is

Another example of target-specific optimization, this time for x86-64, more precisely, for Haswell architecture.
Let's write the function of counting single bits in a word:

cntSetBits(int): # @cntSetBits(int)
popcnt eax, edi
ret
The entire function fit into one popcnt statement, which counts the number of units in the word.
Let's look at IR:

; Function Attrs: norecurse nounwind readnone uwtable
define i32 @cntSetBits(i32 %a) local_unnamed_addr #0 {
entry:
%0 = call i32 @llvm.ctpop.i32(i32 %a), !range !2
ret i32 %0
}
Here we see that the built-in function llvm.ctpop.i32 is used. It was inserted already front-end using high-level information about the code, and the backend for this architecture knows about this function and can replace it with a special command.