Register Allocation using Bipartite Liveness Graphs
Register allocation is an essential optimization for all compilers. A number of sophisticated register allocation algorithms have been developed based on Graph Coloring (GC) over the years. However, these algorithms pose three major limitations in practice. First, construction of a full interference graph can be a major source of space and time overhead in the compiler. Second, the interference graph lacks information on the program points at which two variables may interfere. Third, integration of coloring and coalescing leads to a coupling between register allocation and register assignment, which can further compromise the effectiveness of the solution. This paper addresses these limitations by making a clean separation between the register allocation and register assignment phases. Allocation is modeled as an optimization problem on a new data structure called the Bipartite Liveness Graph (BLG). We model the register assignment phase as a separate optimization problem that avoids some spill instructions by generating register-to-register moves and exchange instructions and at the same time, performs move coalescing and handles register class constraints. We implemented our BLG allocator in both the LLVM static compiler infrastructure and the Jikes RVM dynamic compiler infrastructure. In the LLVM evaluation, our BLG register allocator results in a performance improvement of up to 7.8% for SpecCPU 2006 benchmarks and a significantly lower compile-time overhead compared to a Chaitin-Briggs GC register allocator with both allocators using the same spill-code generator. The BLG allocator delivers performance comparable to the existing LLVM 2.7 Linear Scan register allocator that includes additional optimizations such as live-range splitting and backtracking techniques that are currently not present in the BLG allocator. In the Jikes RVM evaluation, the BLG register allocator delivers runtime performance improvements of up to 30.7% for Java Grande Forum benchmarks and up to 9% for Dacapo benchmarks relative to Linear Scan (LS) with a modest increase in compile-time.