I have recently implemented a graph-coloring register allocator for
Java .class files and I am trying to analyze the effectiveness of my
work. I have benchmarks where I have gathered the total run time
using a local register allocator and using my global register
allocator. I'm not sure exactly what kind of performance speedups I
was expecting, but they weren't as pronounced as I would have hoped.
I have tried to find references comparing local vs. global register
allocators, but almost everything I have found discusses the number of
load/stores removed. A few mentioned the percentage reduction in
cycle times, but most of these were from an era where
instruction-level parallelism didn't exist, so I'm not sure if the
cycle reduction numbers mean anything. Have there been any recent
comparisons of local vs. global register allocators?