Software and hardware challenges due to the dynamic raw NAND market

NAND flash is the dominant type of non-volatile memory technology used today. Developers commonly face difficulties developing and maintaining firmware, middleware and hardware IP for interfacing with raw NAND devices. After reviewing the history and differentiated features of various memory devices, we’ll take a detailed look at common obstacles to NAND device development and maintenance, particularly for embedded and system-on-chip (SoC) developers, and provide some recommendations for handling these challenges.

Background on non-volatile memory technologies
There are many different non-volatile memory technologies in use today.

EEPROM
Electrically erasable programmable read-only memory, or EEPROM, is one of the oldest forms of technology still in use for user-modifiable, non-volatile memories. In modern usage, EEPROM means any non-volatile memory where individual bytes can be read, erased, or written independently of all other bytes in the memory device. This capability requires more chip area, as each memory cell requires its own read, write, and erase transistor. As a result, the size of EEPROM devices is small (64 Kbytes or less).

EEPROM devices are typically wrapped in a low-pin-count serial interface, such as I2C or SPI. Parallel interfaces are now uncommon due to required larger pin count, footprint, and layout costs. Like almost all available non-volatile memory types, EEPROMs use floating gate technology in a complementary metal-oxide-semiconductor (CMOS) process.

Flash
Flash memory is a modified form of EEPROM memory in which some operations happen on blocks of memory instead of on individual bytes. This allows higher densities to be achieved, as much of the circuitry surrounding each memory cell is removed and placed around entire blocks of memory cells.

There are two types of flash memory arrays in the marketplace — NOR flash and NAND flash. Though these names are derived from the internal organization and connection of the memory cells, the types have come to signify a particular external interface as well. Both types of memory use floating gates as the storage mechanism, though the operations used to erase and write the cells may be different.

NOR Flash
NOR flash was the first version of flash memory. Until about 2005, it was the most popular flash memory type (measured by market revenue).[1] In NOR flash, bytes of memory can be read or written individually, but erasures happen over a large block of bytes. Because of their ability to read and write individual bytes, NOR flash devices aren’t suitable for use with block error correction. Therefore, NOR memory must be robust to errors.

The capability to read individual bytes also means it can act as a random access memory (RAM), and NOR flash devices will typically employ an asynchronous parallel memory interface with separate address and data buses. This allows NOR flash devices to be used for storing code that can be directly executed by a processor. NOR flash can also be found wrapped in serial interfaces, where they act similar to SPI EEPROM implementations for reading and writing.

NAND Flash
The organization and interface of the NOR flash devices places limitations on how they can scale with process shrinks. With a goal of replacing spinning hard disk drives, the inventor of NOR flash later created NAND flash. He aimed to sacrifice some of the speed offered by NOR flash to gain compactness and a lower cost per byte [2]. This goal has largely been met in recent years, with NAND sizes increasing to multiple gigabytes per die while NOR sizes have stagnated at around 128 MB. This has come at a cost, as will be discussed later.

Raw NAND memory is organized into blocks, where each block is further divided into pages.

In NAND memories, read and write operations happen on a per-page basis, but erase operations happen per block. The fact that read and write operations are done block-wise means that it’s suitable to employ block error correction algorithms on the data. As a result, NAND manufacturers have built in spare bytes of memory for each page to be used for holding this and other metadata. NOR flash doesn’t have such spare bytes.

Also in contrast to NOR flash, the NAND flash interface isn’t directly addressable, and code cannot be executed from it. The NAND flash has a single bus for sending command and address information as well as for sending and receiving memory contents. Therefore, reading a NAND device requires a software device driver.

NAND flash is the underlying memory type for USB memory sticks, memory cards (e.g. SD cards and compact flash cards) and solid state hard drives. In all cases, the raw NAND flash devices are coupled with a controller that translates between the defined interface (e.g. USB, SD and SATA) and the NAND’s own interface. In addition, these controllers are responsible for handling a number of important tasks for maintaining the reliability of the NAND memory array.

Raw NAND issues and requirements
Let’s take a detailed look at the issues and challenges presented by incorporating raw NAND devices into an embedded system or SoC.

Errors and error correction
Despite being based on the same underlying floating gate technology, NAND flash has scaled in size quickly since overtaking NOR flash. This has come at a cost of introducing errors into the memory array.

To increase density, NAND producers have resorted to two main techniques. One is the standard process node and lithography shrinks, making each memory cell and the associated circuitry smaller. The other has been to store more than one bit per cell. Early NAND devices could store one of two states in a memory cell, depending on the amount of charge stored on the floating gate. Now, raw NAND comes in three flavors: single-level cell (SLC), multi-level cell (MLC) and tri-level cell (TLC). These differ in the number of charge levels possibly used in each cell, which corresponds to the number of bits stored in each cell. SLC, the original 2 levels per cell, stores 1 bit of information per cell. MLC uses 4 levels and stores 2 bits, and TLC uses 8 levels and stores 3 bits.

While reducing silicon feature sizes and storing more bits per cell reduces the cost of the NAND flash and allows for higher density, it increases the bit error rate (BER). Overcoming the increasing noisiness of this storage medium requires larger and larger error correcting codes (ECCs). An ECC is redundant data added to the original data. For example, the latest SLC NANDs in the market require 4 or 8 bits ECC per 512 bytes, while MLC NAND requires more than 16 bits ECC per 512 bytes. But four years ago, SLC NANDs only required 1 bit of ECC, and the first MLC NANDs only required 4 bits of ECC. In the event of errors, the combined data allows the recovery of the original data. The number of errors that can be recovered depends on the algorithm used.

Figure 2: Device issues versus process node shrinks (courtesy Micron)

Ideally, any ECC algorithm can be used to implement ECC as long as the encoder and decoder match. The popular algorithms used for NAND ECC are:

Hamming Code: For 1-bit correction [3]

Reed Solomon: For up to 4 bits of correction. This is less common [4].

BCH : For 4 or more bits of correction [5].

Extra memory (called the "spare memory area" or "spare bytes region") is provided at the end of each page in NAND to store ECC. This area is similar to the main page and is susceptible to the same errors. For the present explanation, assume that the page size is 2,048 bytes, the ECC requirements are 4 bits per 512 bytes and the ECC algorithm generates 16 bytes of redundant data per 512 bytes. For a 2,048-byte page, 64 bytes of redundant data will be generated. For example, in current Texas Instruments (TI) embedded processors, the ECC data is generated for every 512 bytes, and the spare bytes area will be filled with the ECC redundant data. As ECC requirements have gone up, the size of the spare regions provided by the NAND manufacturers have increased as well. ?

The manufacturers of NAND devices specify the data retention and the write/erase endurance cycles under the assumption of the specified ECC requirements being met. When insufficient ECC is used, the device’s usable lifetime is likely to be severely reduced. If more errors are detected than can be corrected, data will be unrecoverable.

Geometry detection
Before raw NAND operations can begin, the first step is to determine the NAND geometry and parameters. The following list is the minimum set of NAND parameters needed by a bootloader or other software layer to determine NAND geometry:

8-bit or 16-bit data width

Page size

Number of pages per block (block size)

Number of address cycles (usually five in current NANDs)

Raw NAND provides various methods for NAND manufacturers to determine its geometry at run time:4th byte ID: All raw NANDs have a READ ID (0x90 at Address 0x00) operation which returns 5 bytes of identifier code. The first and second byte (if the starting byte number is 1 aka "one based") are the manufacturer and device IDs, respectively. The fourth byte (one-based) has information on the NAND parameters discussed above, which can be used by the ROM bootloader.

This 4th byte information can be used to determine raw NAND geometry, yet the interpretation of the 4th byte ID changes from raw NAND manufacturer to manufacturer and between generations of raw NANDs. There are two noteworthy interpretations. The first is a format used by Toshiba, Fujitsu, Renesas, Numonyx, STMicroelectronics, National and Hynix, with certain bits used to represent the page size, data bus size, spare bytes size and number of pages per block. The second is a format particular to the latest Samsung NANDs, holding similar information to the first representation, but with different bit combinations representing different possible values. Since the 4th ID byte format isn’t standardized in any way, its use for parameter detection isn’t reliable.

Other concerns
The physical connection between the embedded processor and the raw NAND device is also of concern. NAND devices can operate at either 3.3V or 1.8V, so it’s important to purchase NANDs with compatible voltage levels. It should be pointed out that 1.8V NAND devices are often specified with worse performance than 3.3V equivalent parts.

Another aspect that must be considered is whether asynchronous or synchronous NANDs will be used. The synchronous interface was something introduced with the ONFI 2.0 specification. Historically, NAND interfaces were asynchronous. However, to reach higher performance levels for data movement, clock synchronized data movement with DDR signaling was provided as an interface option. This type of interfacing may be common in SSD drives but isn’t common in the typical embedded processor or SoC.

I'm not privy to their financials, but these guys are in business to make money, so imagine it must cost-effective :).
I would guess the additional complexity and area required for this logic is currently being offset by the die shrinks (i.e. revenue per wafer is still higher). I question whether that can remain true for very long, though. The die shrinks cause more errors, forcing more robust ECC techniques, then the implementation of those techniques requires more die area,power, etc. Seems like a little bit downward spiral to me.

"Is there any model avaliable to simulate ECC in our own flash controller? Thanks in advance. "
This is something that should probably be investigated with the memory vendors (Micron, Samsung, etc.). I'm quite sure they have this, but I don't know if they would share it.

Very interesting and informatic article.
1. In an "Errors and error correction" section there might be a typing mistake which is "the ECC requirements are 4 bits per 512 bytes". It should be "4 bytes per 512 bytes".
2. Is there any model avaliable to simulate ECC in our own flash controller? Thanks in advance.

Very interesting article. In general it is very difficult to implement logic on a memory process, this explains the demise of "embedded memory". Is it really cost-effective to implement ECC on the die of a NAND chip?

Daniel and Agarwal presented this paper at ESC Silicon Valley this spring, and I asked them to put together a version for us on the designline. I think they did an outstanding job laying out the issues facing NVM technologies. Please add your comments or questions for the authors below.