Fault-Tolerant Communication in Networks-on-Chip

The network-on-chip (NoC) architecture proposes to connect multiple
heterogeneous cores using an on-chip network instead of a shared bus,
and requires network protocols with end-to-end reliability guarantees.
The design of NoC protocols must revisit the core assumptions of large-scale
networking: because high bandwidth is available and computational resources
are scarce, NoC communication can utilize excess network capacity rather
than implement sophisticated fault-tolerance schemes
[ASP-DAC 2003]. We introduced the first
pragmatic approach for fault-tolerant communication in NoC, stochastic
communication, based on randomized gossip protocols. Stochastic communication
provides sustainable throughput and gracefully degrading latency with up to
70% of network packets corrupted by soft errors
[DATE 2003;
VLSI Design 2007].
Stochastic communication advocated a fundamental paradigm shift from
traditional
chip-design approaches, which guarantee the correctness of devices and
interconnects, by tolerating network-on-chip faults at the system level.