Creating highly reliable FPGA designs

Editor's Note:In this article, Angela Sutton of Synopsys explains how design teams can use automated features within Synopsys Synplify design solution to protect their FPGA designs from soft errors.

Radiation-induced soft errors  "glitches"  became widely known in the 1970s with the introduction of dynamic RAM chips. The problem emerged as a result of radioactive contaminants in chip packaging, which emit alpha particles as they decay and subsequently disturb electrons in the semiconductor. This disturbance can result in an unwelcome change in voltage levels in digital logic.

In combinational logic, the voltage disturbance will most likely be transient; an unwanted transient signal is known as a single event transient (SET). However, synchronous logic  such as state machines, registers and memory  can store and propagate the transient error, which is likely to result in hardware failure. Such a stored error is known as a single event upset (SEU).

As far back as 1996, researchers at IBM estimated that each 256MB of RAM suffers one error per month as a result of soft errors (click here for more details). The error rate grows as logic densities increase, switching voltage levels decrease and switching speeds rise. Today's bigger, faster FPGAs will suffer from higher soft-error rates.

Beyond aerospace and defense applications Indeed, soft errors still occur today as a result of radiation from space  even within electronic equipment operating at sea level. For many years, design teams working in aerospace and defense have been aware of the need to protect their designs against SEUs. Today, engineers working in other market sectors are adopting techniques to guard against SEUs. We are increasingly dependent on the safe operation in automotive systems and medical equipment, but high reliability is no longer purely a safety-critical issue; it is a growing concern even for networking and industrial automation systems that demand high quality-of-service and uptime.

Detecting and protecting against SEUsFor some applications, design teams choose to use radiation hardened ("rad-hard") devices that are physically resistant to soft errors. However, rad-hard devices such as MicroSemi's RT ProASIC 3 and Xilinx's Virtex-5-QVR FPGAs come at a price premium and, as a consequence, find use mainly in mission-critical space projects.

Fortunately, there are design-based techniques that engineers can use to detect and protect against soft errors in normal sequential logic FPGA structures. Synopsys' Synplify Premier enables design teams to automatically apply techniques that build safety into the design. These techniques include triple modular redundancy (TMR) and fault-tolerant Finite State Machine (FSM) implementation.

Safe Finite State MachinesA flipped bit in a state machine's state register can put the FSM into what the design team assumed would be an "unreachable" state under normal circumstances. The FSM can become stuck in the invalid state, which is potentially disastrous in, for example, a control logic module.

Safe FSM implementation involves using error-detection circuitry to force a state machine into a reset state or into a user-defined error state so the error can be handled in a specific way. The Synplify synthesis software can be instructed to automatically add error detection circuitry to identify errors and create additional error mitigation circuitry to return the FSM into a safe state, so that the chip resumes correct operation.

For state machines that use "1-hot" state encoding, the error detection circuitry could be a parity checker, which ensures only one state register bit is high at any time. Once an error is detected, the state machine is then returned to a "safe" or "reset" state.

Fault-tolerant FSMs with Hamming-3 encoding can be used to detect and correct single-bit errors with a Hamming distance of 3, ensuring that the content of a state register erroneously reaching an adjacent state would be detected and that correct operation of the FSM would continue.

Deadlock occurs when a state machine enters a state from which it is not able to exit. Design teams can avoid deadlock by automatically inserting timeout counters on critical state machines.

Protecting redundant logicSynthesis tools are designed to optimize away redundant logic, since the tool seeks to meet timing goals in the smallest possible chip area. Many of the structures that help to mitigate soft errors contain logic that synthesis tools would like to remove. Synopsys provides synthesis tool attributes such as "syn_keep" and "syn_preserve" in order to preserve the error detection and mitigation logic that has been created to improve reliability.

Designers can use the RTL "others" clause to specify a fault-tolerant or safe FSM. The "others" clause describes the behavior of the FSM or sequential logic, should an SEU cause it to enter a state that is nominally unused (that is, unreachable), but that in fact can be entered when the SEU causes a bit flip to occur. For example, the code fragment below specifies that the FSM returns to the IDLE state if it enters an unused state:

when others => next_state <= IDLE ;

By default, synthesis would optimize away the "others" clause. Designers can now instruct the Synplify synthesis tool to preserve the "others" clause when optimizing Safe FSMs or sequential logic.

Error correcting code (ECC) memoriesDesign teams can use error-correcting codes (ECCs) to detect and correct single-bit errors. Designers simply have to indicate in the RTL or constraints file which memory functions are safety critical for design. The Synplify Premier software infers the ECC memories offered by many FPGA vendors and automatically makes the proper connections.

Distributed TMR with voting logicDesign teams have used Triple Modular Redundancy (TMR) for years to help mitigate SEUs in sequential circuits. TMR triplicates part or all of the logic in a circuit and then uses "voting" logic to determine the best two from three results in case a signal is changed due to a soft error.

Figure 3 shows how a cone of logic is replicated three times to create identical cones along with voting logic. If one cone fails, the output from the voting logic will pass through to the output the signal with the two-thirds majority vote.

For certain applications, especially those that cannot tolerate going into a reset or error handling state, TMR can be a good way to mitigate soft errors. The disadvantage of TMR is that it takes a lot of extra logic, that is, chip resources, to implement and can impose additional latency in the output of the cone of logic.

In general, the design team will want to selectively implement TMR at a local, block or system level. The Synplify Premier software lets designers decide which parts of the design would benefit from redundancy and automatically implements TMR for those areas.

SummaryRadiation-induced soft errors impose an increasing threat to the reliable operation of mil-aero, communications, automotive and industrial designs alike. Design teams can protect their FPGA designs against soft errors by incorporating redundancy and by developing safe sequential logic and fault-tolerant state machines with custom error mitigation logic. Such techniques ensure safe design operation by returning the design to a known safe state of operation, should a soft error occur. This logic can ensure high system availability in the field and provide reliable system operation. Synopsys Synplify Premier provides designers with the ability to automatically create this circuitry in FPGAs that are not radiation hardened, and the flexibility to control where and how these techniques are applied to the design.

About the authorAngela Sutton brings over 20 years of experience in the field of semiconductor and design tools to her role as staff product marketing manager for FPGA Implementation products at Synopsys. Before joining Synopsys, Ms. Sutton worked as senior product marketing manager in charge of FPGA implementation tools at Synplicity, Inc., which was acquired by Synopsys in May 2008. She has a B.Sc. in Applied Physics from Durham University UK, and a Ph.D. in Engineering from Aberdeen University UK.

If you found this article to be of interest, visit Programmable Logic Designline where  in addition to my Max's Cool Beans blogs  you will find the latest and greatest design, technology, product, and news articles with regard to programmable logic devices of every flavor and size (FPGAs, CPLDs, CSSPs, PSoCs...).

Also, you can obtain a highlights update delivered directly to your inbox by signing up for my weekly newsletter  just Click Here to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).