Publication Year

Authors

BibTeX

Abstract

Feedback-directed optimization (FDO) is effective in improving application runtime
performance, but has not been widely adopted due to the tedious dual-compilation
model, the difficulties in generating representative training data sets, and the
high runtime overhead of profile collection. The use of hardware-event sampling to
generate estimated edge profiles overcomes these drawbacks. Yet, hardware event
samples are typically not precise at the instruction or basic-block granularity.
These inaccuracies lead to missed performance when compared to
instrumentation-based FDO. In this paper, we use multiple hardware event profiles
and supervised learning techniques to generate heuristics for improved precision of
basic-block-level sample profiles, and to further improve the smoothing algorithms
used to construct edge profiles. We demonstrate that sampling-based FDO can achieve
an average of 78% of the performance gains obtained using instrumentation-based
exact edge profiles for SPEC2000 benchmarks, matching or beating
instrumentation-based FDO in many cases. The overhead of collection is only 0.74%
on average, while compiler based instrumentation incurs 6.8%–53.5% overhead (and
10x overhead on an industrial web search application), and dynamic instrumentation
incurs 28.6%–1639.2% overhead.