1.1 Introduction

Since the mid-1990s, the demand for reliability and fault tolerance in
semiconductor memories has increased tremendously. This demand has been fueled
largely by the revolutionary growth of system-on-a-chip (SoC) and embedded
controller architectures that require large amounts of on-chip SRAM, DRAM, and
flash memories, integrated with either digital (i.e., ASIC or FGPA), analog, or
mixed-signal technologies. As minimum feature sizes of MOS transistors for very
deep submi-cron technologies are constantly being scaled down toward 0.05 mm,
embedded memories have become very vulnerable to even minor process variations,
resulting in low manufacturing yield and reliability. Associated with these
process variations are various parametric faults, such as substrate and gate
oxide leakage currents and threshold voltage shifts. Sometimes, these phenomena
may cause a memory device to pass manufacturing tests but eventually fail during
field use. These permanent, or hard errors, therefore, are the result of process
technology variations and weaknesses due to scaling that cause layout defects
and leakage currents in memory devices.

RAM reliability has also been a major concern among designers of integrated
circuits used in mission-critical space and real-time applications, including
those used in harsh, high-radiation environments. In such an environment,
single-event effects (SEE),for example, heavy-ion or alpha-particle strikes,
neutron strikes, and ground-level cosmic radiation, may occur and cause stored
data upsets, known as single-event upsets (SEU) or soft errors. Other
single-event effects, such as single-event gate rupture (SEGR), single-event
hard error (SHE), and single-event burnout (SEB) may have destructive effects on
memory devices. The cost of field maintenance of memory devices used in space
and real-time applications may be too high, warranting the need for an automated
built-in self-repair mechanism, system-level error correction, or a proactive
circuit hardening or technology hardening approach to reduce the risk of such
problems.

Therefore, reliability and fault tolerance of RAMs are important during both
manufacture and field use. Also, from the manufacturing test standpoint, quality
and reliability testing early in the lifetime of the device are as crucial as
functional testing.

Unlike functional testing where the expected behavior of a fault-free device
can be characterized precisely by a binary bit or a bit vector, quality and
reliability testing approaches rely on statistical characterization of a set of
measurements of observed behavior accompanied by economic decision making. The
decision-making process involves forecasting manufacturing cost, operational
effectiveness, warranty cost, marketing strategy, and logistical support. This
entire process thereby leads to a determination of the intrinsic quality and
reliability of the manufactured device. In this chapter, we shall describe the
parameters affecting the intrinsic quality and reliability and how to measure
and test these parameters.