Tools

by
Jinpyo Park, Soo-mook Moon
- In Proceedings of the 1998 International Conference on Parallel Architecture and Compilation Techniques, 1998

"... Graph-coloring register allocators eliminate copies by coalescing the source and target node of a copy if they do not interfere in the interference graph. Coalescing is, however, known to be harmful to the colorability of the graph because it tends to yield a graph with nodes of higher degrees. Unli ..."

Graph-coloring register allocators eliminate copies by coalescing the source and target node of a copy if they do not interfere in the interference graph. Coalescing is, however, known to be harmful to the colorability of the graph because it tends to yield a graph with nodes of higher degrees. Unlike aggressive coalescing which coalesces any pair of non-interfering copyrelated nodes, conservative coalescing or iterated coalescing perform safe coalescing that preserves the colorability. Unfortunately, these heuristics give up coalescing too early, losing many opportunities of coalescing that would turn out to be safe. Moreover, they ignore the fact that coalescing may even improve the colorability of the graph by reducing the degree of neighbor nodes that are interfering with both the source and target nodes being coalesced. This paper proposes a new heuristic called optimistic coalescing which optimistically performs aggressive coalescing, thus fully exploiting the positive impact of ...

...e made for a coalesced node, we attempt to reduce the spill cost by a technique called live range splitting. Live range splitting is a spill cost reduction technique used by many optimizing compilers =-=[2, 3, 8, 9, 10, 11]-=-. A long live range is split into shorter ones by copies and load/stores inserted at carefully selected places. A register allocator can avoid spills by live range splitting since a shorter live range...

by
Florent Bouchez, Alain Darte, Fabrice Rastello
- In Proc. of the International Symposium on Code Generation and Optimization (CGO ’07, 2006

"... Memory transfers are becoming more important to optimize, for both performance and power consumption. With this goal in mind, new register allocation schemes are developed, which revisit not only the spilling problem but also the coalescing problem. Indeed, a more aggressive strategy to avoid load/s ..."

Memory transfers are becoming more important to optimize, for both performance and power consumption. With this goal in mind, new register allocation schemes are developed, which revisit not only the spilling problem but also the coalescing problem. Indeed, a more aggressive strategy to avoid load/store instructions may increase the constraints to suppress (coalesce) move instructions. This paper is devoted to the complexity of the coalescing phase, in particular in the light of recent developments on the SSA form. We distinguish several optimizations that occur in coalescing heuristics: a) aggressive coalescing removes as many moves as possible, regardless of the colorability of the resulting interference graph; b) conservative coalescing removes as many moves as possible while keeping the colorability of the graph; c) incremental conservative coalescing removes one particular move while keeping the colorability of the graph; d) optimistic coalescing coalesces moves aggressively, then gives up about as few moves as possible so that the graph becomes colorable again. We almost completely classify the NP-completeness of these problems, discussing also on the structure of the interference graph: arbitrary, chordal, or k-colorable in a greedy fashion. We believe that such a study is a necessary step for designing new coalescing strategies. 1

... the same register all along its live-range. In other words, borrowing the subtle title of Cytron and Ferrante’s paper [15], what’s in a name has already been decided and no more live-range splitting =-=[13]-=- will be done. In addition to interferences, usually represented as solid lines, each copy instruction u = v is represented by an affinity (u, v), usually represented as a dotted line. If u and v are ...

by
David Koes, Seth C. Goldstein, David Koes, Seth Copen Goldstein
- In Proceedings of the International Symposium on Code Generation and Optimization, CGO ’05, 2005

"... Register allocation is one of the most important op-timizations a compiler performs. Conventional graph-coloring based register allocators are fast and do well on regular, RISC-like, architectures, but perform poorly on ir-regular, CISC-like, architectures with few registers and non-orthogonal instr ..."

Register allocation is one of the most important op-timizations a compiler performs. Conventional graph-coloring based register allocators are fast and do well on regular, RISC-like, architectures, but perform poorly on ir-regular, CISC-like, architectures with few registers and non-orthogonal instruction sets. At the other extreme, optimal register allocators based on integer linear programming are capable of fully modeling and exploiting the peculiarities of irregular architectures but do not scale well. We introduce the idea of a progressive allocator. A progressive allocator finds an initial allocation of quality comparable to a con-ventional allocator, but as more time is allowed for compu-tation the quality of the allocation approaches optimal. This paper presents a progressive register allocator which uses a multi-commodity network flow model to elegantly represent the intricacies of irregular architectures. We evaluate our allocator as a substitute for gcc’s local register allocation pass. 1.

...itectures. Spill code optimization has been addressed by modifying the spilling heuristic [4] and by splitting the live range of a variable so that a variable will only be partially spilled to memory =-=[9, 3]-=-. Although these techniques can significantly improve the quality of the register allocator, they are limited in that they are based on graph coloring. They are not proper or progressive, nor do they ...

"... Register allocation is one of the most studied problems in compilation. It is considered as an NP-complete problem since Chaitin et al., in 1981, modeled the problem of assigning temporary variables to k machine registers as the problem of coloring, with k colors, the interference graph associated t ..."

Register allocation is one of the most studied problems in compilation. It is considered as an NP-complete problem since Chaitin et al., in 1981, modeled the problem of assigning temporary variables to k machine registers as the problem of coloring, with k colors, the interference graph associated to the variables. The fact that the interference graph can be arbitrary proves the NP-completeness of this formulation. However, this original proof does not really show where the complexity of register allocation comes from. Recently, the re-discovery that interference graphs of SSA programs can be colored in polynomial time raised the question: Can we exploit SSA form to perform register allocation in polynomial time, without contradicting Chaitin et al’s NP-completeness result? To address such a question and, more generally, the complexity of register allocation, we revisit Chaitin et al’s proof to better identify the interactions between spilling (load/store insertion), coalescing/splitting (removal/insertion of moves between registers), critical edges (a property of the control-flow graph), and coloring (assignment to registers). In particular, we show that, in general (we will make clear when), it is easy to decide if temporary variables can be assigned to k registers or if some spilling is necessary. In other words, the real complexity does not come from the coloring itself (as a wrong interpretation of the proof of Chaitin et al. may suggest) but comes from the presence of critical edges and from the optimizations of spilling and coalescing.

...ciding if one can assign the variables, this way, to k ≥ 4 registers is thus NP-complete. Chaitin et al’s proof, at least in its original interpretation, does not address the possibility of splitting =-=[10]-=- the live-range of a variable (set of program points where the variable is live 2 ). In other words, each vertex of the interference graph represents the complete live-range as an atomic object, and i...

Just-in-time compilers are invoked during application execution and therefore need to ensure fast compilation times. Consequently, runtime compiler designers are averse to implementing compile-time intensive optimization algo-rithms. Instead, they tend to select faster but less effective transformations. In this paper, we explore this trade-off for an important optimization – global register allocation. We present a graph-coloring register allocator that has been redesigned for runtime compilation. Compared to Chaitin-Briggs [7], a standard graph-coloring technique, the re-formulated algorithm requires considerably less allocation time and produces allocations that are only marginally worse than those of Chaitin-Briggs. Our experimental re-sults indicate that the allocator performs better than the linear-scan and Chaitin-Briggs allocators on most bench-marks in a runtime compilation environment. By increasing allocation efficiency and preserving optimization quality, the presented algorithm increases the suitability and prof-itability of a graph-coloring register allocation strategy for a runtime compiler. 1

...directed graph, DG, must maintain the one-one mapping between itself and an undirected graph – if< n1, n2 >∈ DG ⇒< n2, n1 >∈ DG. This is similar in structure to Cooper and Simpson’s containment graph =-=[10]-=- but encodes very different semantics I in block B of the program, and a new temporary register T created in its place. The allocator must, as before, compute the interference edges for T . Prior to i...

"... Abstract. In achieving higher instruction level parallelism, software pipelining increases the register pressure in the loop. The usefulness of the generated schedule may be restricted to cases where the register pressure is less than the available number of registers. Spill instructions need to be ..."

Abstract. In achieving higher instruction level parallelism, software pipelining increases the register pressure in the loop. The usefulness of the generated schedule may be restricted to cases where the register pressure is less than the available number of registers. Spill instructions need to be introduced otherwise. But scheduling these spill instructions in the compact schedule is a difficult task. Several heuristics have been proposed to schedule spill code. These heuristics may generate more spill code than necessary, and scheduling them may necessitate increasing the initiation interval. We model the problem of register allocation with spill code generation and scheduling in software pipelined loops as a 0-1 integer linear program. The formulation minimizes the increase in initiation interval (II) by optimally placing spill code and simultaneously minimizes the amount of spill code produced. To the best of our knowledge, this is the first integrated formulation for register allocation, optimal spill code generation and scheduling for software pipelined loops. The proposed formulation performs better than the existing heuristics by preventing an increase in II in 11.11 % of the loops and generating 18.48 % less spill code on average among the loops extracted from Perfect Club and SPEC benchmarks with a moderate increase in compilation time. 1

..., thereby improving the code quality. The proposed formulation takes into account both the compactness of the schedule and memory unit usage. Further the formulation incorporates live range splitting =-=[4]-=- which allows a live range to be assigned to a register at specific time instances and be resident in memory in rest of the time instances. To the best of our knowledge, this is the first integrated f...

"... Register allocation is a fundamental part of any optimizing compiler. Effectively managing the limited register resources of the constrained architectures commonly found in embedded systems is essential in order to maximize code quality. In this paper we deconstruct the register allocation problem i ..."

Register allocation is a fundamental part of any optimizing compiler. Effectively managing the limited register resources of the constrained architectures commonly found in embedded systems is essential in order to maximize code quality. In this paper we deconstruct the register allocation problem into distinct components: coalescing, spilling, move insertion, and assignment. Using an optimal register allocation framework, we empirically evaluate the importance of each of the components, the impact of component integration, and the effectiveness of existing heuristics. We evaluate code quality both in terms of code performance and code size and consider four distinct instruction set architectures: ARM, Thumb, x86, and x86-64. The results of our investigation reveal general principles for register allocation design.

"... Effective global instruction scheduling techniques have become an important component in modern compilers for exposing more instruction-level parallelism (ILP) and exploiting the everincreasing number of parallel function units. Effective register allocation has long been an essential component of a ..."

Effective global instruction scheduling techniques have become an important component in modern compilers for exposing more instruction-level parallelism (ILP) and exploiting the everincreasing number of parallel function units. Effective register allocation has long been an essential component of a good compiler for reducing memory references. While instruction scheduling and register allocation are both essential compiler optimizations for fully exploiting the capability of modern high-performance microprocessors, there is a phase-ordering problem when we perform these two optimizations separately: instruction scheduling before register allocation may create insatiable demands for registers; register allocation before instruction scheduling may reduce the amount of parallelism that instruction scheduling can exploit. In this thesis, we propose to solve this phase-ordering problem by inserting a moderating optimization called code reorganization between prepass instruction scheduling and register allocation. Code reorganization adjusts the prepass scheduling results to make them demand fewer registers (i.e. exhibit lower register pressure) and guides register allocation to insert spill code that has less impact on schedule length. Our new approach avoids the complexity of simultaneous instruction scheduling and register allocation algorithms. In fact, it does not modify either instruction scheduling or register allocation algorithms. Therefore instruction scheduling can focus on maximizing instruction-level parallelism, and register allocation can focus on minimizing the cost of spill code. We compare the performance of our approach with a particular successful register-pressure-sensitive scheduling algorithm, and show an average of 18% improvement in speedup for an 8...

Techniques for global register allocation via graph coloring have been extensively studied and widely implemented in compiler frameworks. This paper examines a particular variant – the Callahan Koblenz allocator – and compares it to the Chaitin-Briggs graph coloring register allocator. Both algorithms were published in the 1990’s, yet the academic literature does not contain an assessment of the Callahan-Koblenz allocator. This paper evaluates and contrasts the allocation decisions made by both algorithms. In particular, we focus on two key differences between the allocators: Spill code: The Callahan-Koblenz allocator attempts to minimize the effect of spill code by using program structure to guide allocation and spill code placement. We evaluate the impact of this strategy on allocated code. Copy elimination: Effective register-to-register copy removal is important for producing good code. The allocators use different techniques to eliminate these copies. We compare the mechanisms and provide insights into the relative performance of the contrasting techniques. The Callahan-Koblenz allocator may potentially insert extra branches as part of the allocation process. We also measure the performance overhead due to these branches.

...hat end, we did not consider adding improvements in the Chaitin-Briggs spilling strategy as suggested in various research publications. Specifically, modifications proposed by Bergner [1] and Simpson =-=[11]-=- would reduce the number of spills produced by the allocator. Briggs also suggests that aggressively splitting live ranges could help reduce spill code in loops [3]. Rematerialization is a technique t...

by
Pramod G. Joisha
- In Proceedings of the International Symposium on Memory Management, 2006

"... Reference counting is a well-known technique for automatic memory management, offering unique advantages over other forms of garbage collection. However, on account of the high costs associated with the maintenance of up-to-date tallies of references from the stack, deferred variants are typically u ..."

Reference counting is a well-known technique for automatic memory management, offering unique advantages over other forms of garbage collection. However, on account of the high costs associated with the maintenance of up-to-date tallies of references from the stack, deferred variants are typically used in modern implementations. This partially sacrifices some of the benefits of nondeferred reference-counting (RC) garbage collection, like the immediate reclamation of garbage and short collector pause times. This paper presents a series of optimizations that target the stack and substantially enhance the throughput of nondeferred RC collection. A key enabler is a new static analysis and optimization called RC subsumption that significantly reduces the overhead of maintaining the stack contribution to reference counts. We report execution time improvements on a benchmark suite of ten C # programs, and show how RC subsumption, aided with other optimizations, improves the performance of nondeferred RC collection by as much as a factor of 10, making possible running times that are within 32 % of that with an advanced traversal-based collector on seven programs, and 19 % of that with a deferred RC collector on eight programs. This is in the context of a baseline RC implementation that is typically at least a factor of 6 slower than the tracing collector and a factor of 5 slower than the deferred RC collector.

...heap locations. The motivation was that write barriers for such operations could be removed. A data structure similar to the live-range subsumption graph called the containment graph was described in =-=[9]-=-. Nodes in the containment graph denote live ranges (unlike the live-range subsumption graph where they represent local variables). A directed edge is inserted from a node j to a node i if i is live a...