Abstract

The significance of soft errors in logic has grown because of reduced memory
vulnerability and the shrinking dimensions of semiconductor technology coupled
with the increasing amount of logic integrated into a chip. Consequently, some
of ARM’s customers are concerned about how soft errors on the bus interconnect
will affect the dependability of their systems, since the interconnect is a critical
hub of communication in a SoC and represents a substantial and growing amount
of logic. With the rising complexity of their systems, the interconnect will
become larger and more complex in the future, adding to their concern. In this
work the impact of soft errors on the bus interconnect logic was investigated
and a product was developed to ameliorate the effects of such errors on ARM’s
customers’ products.
Methods to measure the SER of ARM IP were investigated by focusing on
logical masking, which is a component in the calculation of the SER. The effect
that the topology of a combinatorial logic circuit has on its logical masking rate
was considered by performing gate-level statistical fault injection on different
implementations of adder circuits. Significant variation in logical masking was
found ranging from a factor of 3.1 at a synthesis frequency of 100 MHz to a factor
of 2.1 at 900 MHz. This difference is explained in an original way by correlating
logical masking with the circuit’s path length and fan-out. These properties
could be used to create a static method of measuring the logical masking rather
than the current time-consuming method of dynamic simulation. Additionally,
nearly 30% of faults injected cause more than one error, which means that the
combinational SER will be underestimated if research does not take gate fan-out
into consideration. Using this methodology a circuit designer can now base his
choice or development of a circuit on its reliability as well as its performance,
power, and area. Studying the variation in the factors that affect the SER is
important to ensure accuracy in addressing customer requirements.
Although it is important to consider the rate of soft error occurrence, in this
work the impact of errors is demonstrated to be critical. Using protocol-level
fault injection it is shown that faults on the ARM AXI bus interconnect can have
a serious effect on the reliability of the entire SoC such as deadlock, memory
corruption, or undefined behaviour. Using a fault-path traversal algorithm,
it is demonstrated that traditional error detection codes are not sufficient at
preventing these failures when faults occur on certain AXI bus signals. This led
to the development of novel fault tolerant methods that provide protection for
these identified signals. Based on these developments, a product was proposed for
an add-on to the AXI bus interconnect that can detect, correct, and report logic
soft errors without changing the AMBA standard or the customer’s connecting
IP.