GNU* Compiler Collection 8 (GCC 8) - Transitioning to a new compiler

Every year, the Linux* community awaits the release of a new version of the GNU* Compiler Collection. The collection includes front ends for C, C++, Objective-C, Fortran, Ada, and Go, as well as libraries for these languages. The GCC community works hard to provide usability improvements, bug fixes, new security features, and performance improvements.

The GCC 8 Release Series changes list includes a full list of changes, new features, and fixes for this release. This blog article uses code examples to show how to use the following new compiler features:

Interprocedural optimization improvements

Control-flow enforcement technology

Changes in loop nest optimization flags

Interprocedural optimization improvements

As the Linux community continues to redefine the boundaries of what is possible in a Linux distribution running on new silicon, performance plays an increasingly important role in the industry. Optimizations at compile time have been playing an increasing role over the last years. Interprocedural Optimization (IPO) is an automatic, multi-step process that allows the compiler to analyze your entire code to determine where you can benefit from specific optimizations in programs containing many frequently used functions.

In the new GCC 8, there are two major changes for interprocedural optimizations. The first one is reworked run-time estimation metrics, which leads to more realistic guesses driving inlining and cloning heuristics. This is an internal change on how GCC represents frequencies of basic blocks of code. In the previous GCC 7 version, it was prone to overflow.

Block frequency is a relative metric that represents the number of times a block executes. The ratio of a block frequency to the entry block frequency is the expected number of times the block will execute per entry to the function. A basic block (BB) is a sequence of instructions with a single entry at the start and a single exit at the end. These blocks are linked together with the Control Flow Graph (CFG). The following figure shows a simple If statement and the corresponding CFG generated with gcc test.c -fdump-tree-all-graph which generates dot files.

Figure 1. Simple If and its basic control flow graph

The change made in GCC 8 to improve the accuracy basic blocks count can affect all optimizations (including Profile Guided Optimizations, inlining, and cloning heuristics). Basic block frequencies is a core component in compiler optimizations.

Another important change in GCC 8 is the Interprocedural Analysis (IPA). IPA is a form of dataflow analysis between functions. As we know, GCC builds a “call graph” recording which functions call other functions. In GCC 8, the ipa-pure-const pass is extended to propagate the malloc attribute.

The keyword __attribute__ allows you to specify special attributes when making a declaration. This keyword is followed by an attribute specification inside double parentheses. One of these is the malloc attribute: __attribute__((malloc))

The malloc attribute is used to inform the compiler that a function may be treated as any non-NULL pointer. Because of this, the return of the function cannot alias any other pointer valid when the function returns. In compilers, aliasing is the case where the same memory location can be accessed using different names. It is vitally important that a compiler can detect which accesses may alias each other, so that optimizations can be performed correctly. The following example shows the use of the malloc attribute:

In GCC 8, the corresponding warning option Wsuggest-attribute=malloc emits a diagnostic for functions that can be annotated with the malloc attribute.

When we enable the __attribute__((malloc)), the code looks like the following example:

After this, the following compilation command line works without warnings:

$ gcc malloc.c -Wsuggest-attribute=malloc

As we have seen, Interprocedural Optimization (IPO) allows the compiler to analyze your entire code and propose optimizations. The improvements that GCC 8 has done on this technology will play an important role on the performance of end user's applications.

Control-flow enforcement technology

Another important section for compilers is security. One of the attacks that GCC 8 helps to prevent are Return Oriented Programming (ROP) and call/jmp-oriented programming (COP/JOP). These attack methods have the following common elements:

Diverting the control flow instruction (e.g. RET, CALL, JMP) from its original target address to a new target (via modification in the data stack or in the register).

Attackers set a code module with execution privilege and contain small snippets of code sequence. This sequence has the characteristic that at least one instruction in the sequence is a control transfer instruction that depends on data either in the return stack or in a register for the target address.

The new fcf-protection option option enables support for the Control-Flow Enforcement Technology (CET) feature in future Intel CPUs by enabling instrumentation of control-flow transfers to increase program security. The fcf-protection option checks for valid target addresses of control-flow transfer instructions (such as indirect function call, function return, and indirect jump). For example, the instruction at the target of an indirect jump must be an ENDBRANCH instruction, a particular form of NOP. This prevents diverting the flow of control to an unexpected target.

As an additional protection, the Clear Linux project provides the option: mzero-caller-saved-regs=[skip | used | all]. This option clears caller-saved general registers upon function return. This is intended to make threats such as ROP, COP, and JOP attacks much harder.

Changes in loop nest optimization flags

There are a few changes in the optimization flags for GCC 8. Thefloop-interchange flag applies a classical loop nest optimization and is enabled by default at -O3 optimization level and above. Consider the following code:

In C, arrays are stored in row major order. At the beginning of our sample code execution, when the processor accesses an array element for the first time, it retrieves an entire cached line of data from main memory to the cache memory. If the rest of the data will be used soon, this is a major performance boost. If on the other hand, the rest of the data is not used, this is a net performance loss. If the array is accessed incorrectly, we will see this loss.

In this example, the floop-interchange flag exchanges the loops so the array is accessed in the optimal order, because the variable used in the inner loop switches to the outer loop. We can see this in the transformed code, where it accesses k[0,0], k[0,1], … k[0, 99], k[1,0] …k[999, 99] rather than k[0,0], k[1,0], k[ 2,0] … k[999,0], k[0, 1] … k[999, 99].

Conclusion

The Linux community continues to redefine the boundaries of what is possible in a Linux distribution running on new silicon. Both performance and security play an increasingly important role in the industry. In the Clear Linux* Project, we decided to use and improve the latest GCC compiler technology to boost the performance and security of a Linux-based system for open source developers. We encourage users to employ the latest technologies that can improve applications for customers by boosting their performance and also providing a more robust layer of protection against security attacks.