Abstract. Indirect branch prediction is likely to become increasingly important in the future
because indirect branches occur more frequently in object-oriented programs. With misprediction
rates of around 25% on current processors, indirect branches can incur a significant fraction of
branch misprediction overhead even though they remain less frequent than the more predictable
conditional branches. We investigate a wide range of two-level predictors dedicated exclusively to
indirect branches. Starting with predictors that use full-precision addresses and unlimited tables,
we progressively introduce hardware constraints and minimize the loss of predictor performance
at each step. For programs from the SPECint95 suite as well as a suite of large C++ applications,
a two-level predictor achieves a misprediction rate of 9.8% with a 1K-entry table and 7.3% with an
8K-entry table, representing more than a threefold improvement over an ideal BTB. A hybrid predictor
further reduces the misprediction rates to 8.98% and 5.95%, respectively.

1. Introduction

Current high-performance superscalar processors use branch prediction to speculatively execute
instructions beyond an unresolved branch. If the branch is mispredicted, this work is lost, and
execution must restart right after the branch instruction. Newer designs increase instructions issue
width and pipeline depth, increasing the relative overhead of mispredicted branches and making
accurate branch prediction even more critical to performance.

Conditional direct branches, whose target is encoded in the instruction itself, can already be
predicted with reported hit rates of up to 97% ([YP93]). In contrast, indirect branches, which
transfer control to an address stored in a register, are harder to predict accurately. Unlike conditional
branches, they can have more than two targets so that prediction requires a full 32-bit or 64-
bit address rather than just a ?taken? or ?not taken? bit. Current processors predict indirect branches
with a branch target buffer (BTB) which caches the most recent target address of a branch. Unfortunately,
BTBs typically have much lower prediction rates than the best predictors for conditional
branches. For example, an ideal (unconstrained) BTB achieves an average prediction hit ratio of
only 64% on the SPECint95 benchmarks.

Though not as common as conditional branches, indirect branches occur frequently enough to
cause substantial overhead. Chang et al. [CHP97] predict a reduction in execution time of 14% and
5% for the perl and gcc benchmarks on a wide-issue superscalar processor with an improved
prediction mechanism for indirect branches (Target Cache).