Conference

Authors

Andrea Pellegrini, Joseph L. Greathouse, Valeria Bertacco

Abstract

The reliability of future processors is threatened by decreasing transistor robustness. Current
architectures focus on delivering high performance at low cost; lifetime device reliability is a secondary
concern. As the rate of permanent hardware faults increases, robustness will become a first class constraint
for even low-cost systems. Current research into reliable architectures has focused on ad-hoc solutions to
improve designs without altering their centralized control logic. Unfortunately, this centralized control
presents a single point of failure, which limits long-term robustness.

To address this issue, we introduce Viper, an architecture built from a redundant collection of
fine-grained hardware components. Instructions are perceived as customers that require a sequence of services
in order to properly execute. The hardware components vie to perform what services they can, dynamically
forming virtual pipelines that avoid defective hardware. This is done using distributed control logic, which
avoids a single point of failure by construction.

Viper can tolerate a high number of permanent faults due to its inherent redundancy. As fault counts
increase, its performance degrades more gracefully than traditional centralized-logic architectures. We
estimate that fault rates higher than one permanent faults per 12 million transistors, on average, cause
the throughput of a classic CMP design to fall below that of a Viper design of similar size.