Using nvSRAMs for power-fail reliability in enterprise SSDs

During a data transfer operation in an Enterprise storage system, such as reading or writing a location in the Enterprise SSD flash memory, the power supply system to all the components involved – including the storage system Host, SSD controller, SDRAM cache, and NAND flash memory – must operate efficiently to ensure a successful transaction. However, electronic systems are vulnerable to power disruptions like voltage spikes, blackouts, surges, and brownouts. These could result in potential data loss or corruption of:

Cache data in transit to flash memory

Metadata

Enterprise SSDs cannot lose data that they have reported as “committed to NAND flash” back to the storage system controller. The enterprise SAS/SATA market has a hot swap specification that requires no “committed” data be lost at any time, even if the power is suddenly cut. An example of this would be an operator error where the wrong drive is ejected during a hot swap servicing session.

There are two mechanisms with which Enterprise SSD controllers report the status of the data received back to the storage system controller. The Enterprise SSD can behave in a “write-thru” fashion, in which the Enterprise SSD controller does not tell the storage system controller that the data and modified metadata are “committed” until it is, in fact, safely committed to the NAND flash memory.

The Enterprise SSD can also behave in a “write-back” fashion, in which some data streams and/or corresponding modified metadata are not yet “committed” to flash, but are reported as “committed” back to the storage system controller. Any data that is reported “committed” back to the storage system controller needs to be made nonvolatile in the event of a power default. Any other data in the Enterprise SSD’s cache is assumed to be lost at power default. The “write-back” approach allows for the random IOPS to be improved significantly over the “write-thru” approach and, therefore, is preferred for high-random IOPS drives.

To ensure proper working of “write back” implementation, Enterprise SSDs contain a power failure detection circuit that monitors the voltage supply and sends a signal to the SSD controller if the voltage drops below a predefined threshold. A secondary voltage hold-up circuit is implemented to ensure the drive has sufficient power for a sufficient duration to back-up the SDRAM cache data. When power is lost, these secondary voltage sources provide the energy needed for the required duration in order to transfer the contents from SDRAM to NAND flash. Figure 2 below shows a typical power failure detect circuit block diagram for Enterprise SSDs.

Figure 2: Typical Power Failure Detect Circuit Block Diagram

The secondary voltage source can be either a high-capacity super capacitor or a bank of discrete tantalum capacitors.

Supercapacitor

Supercapacitors, also known as ultracapacitors or electric double layer capacitors (EDLC), are capacitors with significantly higher energy density than any other capacitor type available and are used as reliable alternatives to batteries in battery back-up applications.

However, supercapacitors have reliability problems: they suffer from a known set of deficiencies with regard to long-term reliability, much like aluminum electrolytic capacitors. Supercapacitors have a limited service life, as electrolyte dissipates over time and operating temperature from the component, resulting in component wear. The performance of the supercapacitor degrades slowly with electrolyte loss, until the onset of total failure occurs with little or no warning. In addition, the loss rate increases at higher operating voltages, and in higher operating and non-operating temperature environments. For every 10 °C increase in ambient operating temperature, the life expectancy of a super capacitor can be cut approximately in half.

Supercapacitor failure modes are:

Cell opening due to the electrochemical decomposition overpressure.

The voltage and the temperature generate a gas pressure inside the cell that slowly increases with the time. When the pressure reaches a certain limit, a mechanical fuse, generally a groove on the can, opens softly.

When used for long periods at high operating temperatures, the moisture of the electrolyte evaporates and the equivalent series resistance (ESR) increases. The fundamental failure mode is the open mode with ESR increase. All supercapacitors come with the warning: “When using these capacitors, incorporate appropriate safety measures in your design, such as redundancy and protection measures”.

Bank of Discrete Capacitor

A bank of discrete capacitors provides a more reliable alternative, but requires more careful design. A discrete capacitor-based voltage hold-up circuit employs a bank of discrete capacitors connected in parallel. The discrete capacitors used could be aluminum capacitors, tantalum capacitors, or niobium capacitors. While lacking the compactness of supercapacitors, the capacitance-to-size ratio of a discrete solution takes up significant board space. Tantalum capacitors are also known to be sensitive to shorting and smoking failure mechanisms.

The nvSRAM Solution

The nonvolatile SRAM (nvSRAM) value proposition for Enterprise SDD is to eliminate or minimize the supercapacitor or bank of discrete capacitors, and reliably back-up the in-transit SDRAM cache data and metadata with a single-chip, battery-less, non-volatile RAM-based technology. A brief description of nvSRAM operation is presented below before discussing the specific details of using nvSRAM devices in Enterprise SSDs.

How does an nvSRAM compare with the new MRAM technology in terms of cost and reliability? It seems that MRAM is ultimately a better solution, since it is non volatile, does not require a data transfer, and does not require a back-up power source. MRAM also has SRAM-like interfaces and timing...
Clearly, nvSRAM has an advantage in terms of density, but how much is actually required for a an SSD write cache?

nvSRAM is completely on the standard CMOS manufacturing process and this is a big advantage in terms of both cost, manufacturability and supplier reliability. MRAM, on the other hand is on a non-standard process, which translates to higher costs.
SSD write cache density would be in 100s of MBs to GBs range, but only a portion of that data (metadata) needs to be backed-up reliably on power loss and that is where nvSRAM solution comes in. Replacing all of the SDRAM write cache with nvSRAM would be the ideal solution. But, those densities are currently out of reach.

MRAM is a proven solution for protection of data in the event of system power loss because it is inherently non-volatile, there is no additional time needed to write data to a secondary non-volatile array, it simplifies the controller design, offers the speed of RAM and has superior endurance. MRAM is processed using standard CMOS base wafers to provide low cost as well as lower overall TCO to the user. Everspin, the leader in MRAM technology,announced customer sampling of the world's first commercial Spin Torque MRAM in a 64Mb density with a DDR3 interface. Now in one device the user can have power fail safety, non-volatility, DRAM DDR3 compatibility, with ultra low latency. With MRAM users can choose between SRAM like interface with existing products in volume production as well as look to the future with ST-MRAM in DRAM like interfaces.

Pramodh a couple of questions. What is the timeline for bit capacity greater than the 16-Mbit planned for production in the first quarter of 2013. What is the lithographic node for the 16-Mbit you (Cypress) are now sampling?
Looking at the future requirements of enterprise SSDs enterprise servers, what in your view is likely to be the bit capacity requirement for SRAM or your nvSRAM?
What is your view of PCM as a competitor for this SSD role?

Neale, 16Mb is planned for first quarter of 2013, with 32Mb in fourth quarter.
16Mb devices are on 130nm node.
Based on the future requirements and feedback from customers, the density requirement would be in the range of 32Mb - 128Mb.
We have not evaluated PCM as a major competitor. I will check on PCM and get back to you.