A New Chapter for System Designs Using NAND Flash Memory

Transcription

1 A New Chapter for System Designs Using Memory Jim Cooke Senior Technical Marketing Manager Micron Technology, Inc December 27, 2010 Trends and Complexities trends have been on the rise since was first introduced Although it s not a new issue, the required to support newer multilevel (MLC) and three-bit-per-cell technologies is becoming increasingly difficult for system designers to keep up with has historically been used to improve the overall data reliability of subsystems However, as cells shrink, fewer electrons are stored per floating gate To compensate for the increasing bit error rates of these smaller geometry cells, requirements have to dramatically increase to maintain the desired system reliability As system requirements for increase, the number of gates required to implement the logic also increases, as does the system complexity For example, 24 bits of requires about 200,000 gates, while 40 bits of requires about 300,000 gates It is estimated that in the future, advanced algorithms will approach close to 1 million gates See Figure 1 Logic Gates 350, , , , , ,000 50, Future Process Geometry (nm) Bits Required Many high-performance systems require multiple channels of to reach the desired performance In these systems, each channel typically has its own logic For example, a 10-channel SSD may have 10 channels of logic implemented If 60 bits of were required for each of the 10 channels, the result would be 3 million gates just for the logic # of Gates to implement MLC-2 Required Figure 1: Increases as Process Geometries Shrink 1

2 A New Chapter for System Designs Using Memory Interface Choices Fully Managed Solutions Fully managed interfaces are typically the simplest to interface with Examples include e MMC memory or esd, which use a traditional multimedia card or SD card interface These multichip packages (MCPs) include a controller that implements the desired interface, as well as the and block management required by the An advantage of fully managed solutions is that they require the least amount of effort from the host s perspective The host software uses a straightforward block-level interface; and the controller within the MCP is responsible for the, block management, and wear leveling The disadvantage is that they re singlethreaded and based on a simple command, data, and acknowledge protocol, which limits performance In addition, the processor used in these MCPs is typically a small 8-bit processor, which also impacts performance These implementations are typically not very deterministic, because the internal controller can be doing block management or garbage collection activities in the background Consequently, they can be exposed to unexpected power loss With regard to performance, sequential READ/WRITE operations provide reasonable bandwidth for large data transfers, while random READ/WRITE operations can yield lower performance for smaller data transfers Legacy Interface The interface has traditionally been an asynchronous interface Although interface speeds have improved up to 50 MHz in recent years, not much else has changed on this interface Several years back, Micron and several other forwardthinking companies joined together to form a organization that was focused on simplifying the myriad of timing and command specifications offered by the industry The Open Interface (ONFI) developed the first version of their specification, ONFI 10 While there are many advantages to the original ONFI 10 specification, one of the biggest is the ability for the host to electronically detect the type of device that is connected, as well as other important parameters, like timing modes, page size, block size, requirements This feature has been carried forward to all of the ONFI specifications and remains an important aspect of all ONFI standards Another significant accomplishment of the ONFI organization was the development of the synchronous interface, also known as ONFI 2 ONFI 22 currently supports up to 200 mega transfers per second (200 MT/s) using a DDR, source-synchronous interface That is, after powering up, it can be used in asynchronous mode However, for higher performance, the host interrogates the device to see if it is able to support the higherspeed synchronous interface before changing to it Direct Solutions Implementations that connect directly to the host processor are responsible for managing the Hardware manages the, while software typically performs all block management and wear-leveling operations At first this may seem like a disadvantage However, with today s typical embedded processors running at speeds of hundreds of megahertz and often over one gigahertz, these high-performance processors can accomplish block management much faster and can take advantage of deterministic, multithreading techniques to improve performance In addition, with the host managing the device directly, the host software can make real-time decisions that can help eliminate exposure to unexpected power failures As shown in Figure 2 (page 3), the ONFI 22 specification (200 MT/s) was designed to accommodate up to 16 standard loads A typical implementation of this would be using two 8- packages The standard 8-, 100-ball BGA package includes two separate buses (DQ[7:0]1 and DQ[7:0]2), with each bus having four wired together Each of the four stacks are controlled with two chip enables A typical design would wire the two data or DQ buses together, forming a single 8-bit data bus for each package A maximum configuration would consist of two 100-ball BGA packages, each containing eight Each of these standard 100-ball BGA packages requires four chip enables (CE#) to select a specific Thus, the system controller needs to supply eight chip enables to support this configuration 2

3 A New Chapter for System Designs Using Memory 0 BGA 100 Package #1 BGA 100 Package #2 System Contoller 1 30 Control & CE1 2 DQ[7:0]1 Control & CE3 4 DQ[7:0]2 Control & CE1 2 DQ[7:0]1 Control & CE3 4 DQ[7:0]2 31 Figure 2: Typical High-Capacity Implementation Using Two 8-Die, 100-Ball BGA Packages Solutions Figure 3 (page 4), shows two system implementations: a traditional system where the processor is interfacing directly with and a system using Both implementations use the same ONFI hardware interface and similar 100-ball BGA ballouts The example includes a thin controller package with the in an MCP The controller implements the required by the inside the MCP package Utilizing the same ONFI asynchronous or synchronous interface enables designers to migrate easily from standard to Micron offers two versions of : Standard and Enhanced Standard, suggested mainly for consumer devices, implements the required and provides a traditional asynchronous ONFI bus for easy migration Enhanced manages, in addition to offering several performance-enabling features that are of most value to enterprise applications It also supports both the asynchronous and synchronous versions of the ONFI 22 interface and is available in densities up to 64GB By abstracting the, both versions of will be able to handle the additional that future versions of will require This will eliminate the need for designers to continually redesign their circuitry to keep up with manufacturers latest requirements Enhanced Figure 4 (page 4), shows the Enhanced architecture It supports a single ONFI 22 interface and up to 200 MT/s command, address, and data bus The V DDI decoupling capacitor is common in e MMC products and other devices that include a controller It is required to decouple the internal voltage regulator For backward compatibility with standard devices, the V DDI connection is located on an unused pin The controller supports two internal buses, one for even logical unit number or s and one for odd s These two independent buses can operate at up to 200 MT/s In addition, each bus has its own engine and can manage simultaneous READ or WRITE operations on the two buses It is envisioned that future versions of the controller will support the ONFI 3 specification, which is targeting up to 400 MT/s 3

5 A New Chapter for System Designs Using Memory Feature ONFI-compatible controller and device in a standard package (14 x 18mm; 100-ball BGA) Benefit 1 Footprint compatible with raw 2 Manages and abstracts 3 Improves endurance (future versions) 4 Provides platform for future enhancements Single 200 MT/s front side bus 1 Reduces loading on bus 2 Allows more per channel 3 Improves signal integrity Dual 166 MT/s internal buses 1 Enables parallel operations 2 Local copyback offloads wear-leveling operations Table 1: Enhanced Features and Benefits Many of these basic features are covered in the following discussion of the four advanced functions offered by Enhanced : volume addressing, electronic data mirroring, interrupt and internal copyback Volume Addressing Volume addressing allows a single chip select or chip enable (CE#) to address up to 16 volumes Each controller can support up to eight packaged in an MCP The controller provides a buffer for the system controller access The Enhanced design, shown in Figure 5 (page 6), offers an eightfold improvement in density while maintaining or improving signal integrity and reducing the active number of chip enables that are required This is because a single controller represents only one load to the system controller, but supports up to eight in the MCP package There are two aspects to the volume addressing concept The first is establishing the volume address for each of the packages The volume address is appointed only once at initialization and is maintained until the power is cycled The second aspect is the volume select command itself This is a new command that is followed by a single byte (actually only 4 bits) volume address Once the intended volume is selected, it remains selected until another volume is selected The chip enable pin savings can be significant For example, with a standard implementation, a single channel requires eight chip enables to control two 8- packages The 32-channel example would require a total of 256 chip enable pins whereas the Enhanced volume addressing feature can address the same amount of using only 32 chip enable pins What s more, these same 32 chip enable pins can be used to address eight times the density Electronic Data Mirroring Enhanced supports electronic data mirroring, which allows the data bus signal order to be electronically remapped to one of two configurations This is particularly useful for high-density designs with devices mounted on both the top and bottom of the PCB The package is able to electronically detect whether it is mounted on the top or bottom of the PCB This is accomplished using a specific initialization or reset sequence For example, it s common practice to issue a reset or FFh command to the device on power-up To accomplish the electronic DQ mirroring, the host must follow this FFh command with the traditional READ STATUS (70h) command The top detects this command sequence as FFh-70h; the bottom recognizes this same sequence as FFh-0Eh and can establish that it is the bottom package and reorder its data bus to align directly under the top This not only improves PCB routing but also improves signal integrity 5

6 A New Chapter for System Designs Using Memory 0 BGA 100 Package #1 BGA 100 Package #16 System Contoller Control & CE1 Control & CE1 DQ[7:0]1 DQ[7:0]1 Figure 5: Typical High-Capacity Implementation Using up to Eight Die in 100-Ball BGA Packages Ready/Busy# Redefined to Interrupt Figure 6A shows the READY/BUSY# operation of standard The READY/BUSY# function is an opendrain signal The standard 8-, 100-ball BGA package includes four ready/busy (R/B#)pins Like chip enable, each of these pins is connected to two individual Referring to Figure 6A, R/B# 1 would be connected to Die 1 and Die 2 Because signal connections are at a premium, designers will either not use the R/B# signals, or if they do, they will wire-or all of the R/B# signals from one package (shown at the bottom of Figure 6A) This is not ideal, because any single that is busy pulls the entire R/B# signal LOW, preventing that become ready from being detected If the implementation does not use the R/B# signals, it has the option of using firmware to poll the status of each of the in the package, which is time-consuming and wastes power Neither case is ideal together When the system controller detects an interrupt, it can simply interrogate each of the packages or volumes to learn which volume has posted the new status The interrupt# operation can save signals on the system controller while improving the ability for the system controller to respond to status updates Internal Copyback The last but perhaps most noteworthy feature of Enhanced is the internal COPYBACK function, also known as INTERNAL DATA MOVE This function can provide a significant advantage in SSD systems when it comes to wear-leveling or garbage collection operations; that is, the process of collecting fragments of data scattered throughout the various pages and blocks of the and coalescing them in a more streamlined block or sequence of blocks It is similar to the old hard disk defragmentation utility Enhanced redefines the existing R/B# pin to be an interrupt pin As shown in Figure 6, the interrupt# signal, which is still open-drain, provides a real-time interrupt when the volume or becomes ready Designers can use this signal to provide real-time status to the system controller In larger configurations supporting multiple packages on a single bus, the interrupt# signals can be wired 6

7 A New Chapter for System Designs Using Memory Referring back to Figure 2, when using standard, moving data fragments from one block to another typically requires the following sequence of operations: 1 The system controller issues a READ command and source address to access the source page of data 2 The system controller inputs data from the device while calculating and making any necessary corrections Any updating of data or metadata is usually accomplished after this step 3 The system controller calculates and appends the new information before issuing a new PROGRAM command, destination address, and data sequence, which will store the data in the new block In this sequential operation, the bus is busy while moving the source and destination data from and to the device The timing of these operations can be significant An ONFI 22 synchronous bus operating at 200 MT/s would require about 41µs to move the data, assuming an 8K page Because the data has to be moved from and to the device, we double this time to 82µs, which doesn t include the overhead While this sequence is being carried out, the ONFI bus is busy and cannot be used for other operations Enhanced is different in that it supports internal Using this built-in enables the COPYBACK operation to be performed internal to the Enhanced package, assuming the source and destination of the data are within the package The system controller is still responsible for issuing the commands and addresses as well as any modified data or metadata The data movement is handled by the controller and does not tie up the external ONFI bus If the system controller is able to keep its wear-leveling and garbage collection operations within a single package, it can have significant performance advantages Figure 7 (page 8), shows an example using Enhanced on two ONFI channels labeled Channel 0 and Channel 1 On both SSD channels, we can see that four of the INTERNAL DATA MOVE operations are occurring simultaneously without the external ONFI bus being used for the data movement This frees the system controller and ONFI bus to move data from one package to another, if necessary Depending on the architecture, some percentage of these operations may need to go between packages or even between ONFI buses Taking advantage of the INTERNAL DATA MOVE operation can provide a significant performance improvement for garbage collection and wear-leveling operations Die 1 Die 1 Die 2 Die 2 Die 3 Die 3 Die 4 Die 4 Die 5 Die 5 Die 6 Die 6 Die 7 Die 7 Die 8 Die 8 R/B# Interrupt# Figure 6A: Ready/Busy# Operation Figure 6B: Interrupt# Operation 7

8 A New Chapter for System Designs Using Memory Conclusion Micron s Enhanced provides additional performance and features while eliminating the impact of s ever-increasing requirements Because Enhanced supports a ballout similar to the standard 100-ball BGA devices, it s possible to design your product to support both An example would be to include enough in your system controller to support SLC directly and select Enhanced for your multilevel cell needs, where can present more of a challenge The volume addressing feature of Enhanced Clear- enables higher densities to be addressed using fewer chips, saving potentially hundreds of pins in high-capacity implementations Electronic data mirroring simplifies PCB design and routing while improving the signal integrity of the ONFI bus The intelligent interrupt capability provides for real-time status updates to the system controller and minimizes the polling for firmware The dual internal buses provide improved COPYBACK operations, which in turn improve performance CHANNEL 0 CHANNEL 1 Figure 7: Copyback Using Enhanced microncom Products are warranted only to meet Micron s production data sheet specifications Products and specifications are subject to change without notice Micron, the Micron logo, and are trademarks of Micron Technology, Inc All other trademarks are the property of their respective owners 2010 Micron Technology, Inc All rights reserved 12/27/10 ENL 8

NAND 201: An Update on the Continued Evolution of NAND Flash Jim Cooke Sr. Technical Marketing Manager Micron Technology, Inc. September 6, 2011 A lot has changed with NAND Flash memory since my original

Embedded NAND Solutions from 2GB to 128GB provide configurable MLC/SLC storage in single memory module with an integrated controller By Scott Beekman, senior business development manager Toshiba America

03 Basics Understanding the Technology Behind Your SSD Although it may all look the same, all is not created equal: SLC, 2-bit MLC, 3-bit MLC (also called TLC), synchronous, asynchronous, ONFI 1.0, ONFI

emmc Technology Application Preface emmc, short for Embedded MultiMediaCard, is the embedded memory standard specification defined by the MMC (MultiMediaCard Association). It is mainly applied in mobile

DDR subsystem: Enhancing System Reliability and Yield Agenda Evolution of DDR SDRAM standards What is the variation problem? How DRAM standards tackle system variability What problems have been adequately

1 / 25 CS 137: File Systems Persistent Solid-State Storage Technology Change is Coming Introduction Disks are cheaper than any solid-state memory Likely to be true for many years But SSDs are now cheap

Technologies Supporting Evolution of SSDs By TSUCHIYA Kenji Notebook PCs equipped with solid-state drives (SSDs), featuring shock and vibration durability due to the lack of moving parts, appeared on the

Trends in NAND Flash Memory Error Correction June 2009 Eric Deal Cyclic Design Introduction NAND Flash is a popular storage media because of its ruggedness, power efficiency, and storage capacity. Flash

An explanation and comparison of SLC and MLC NAND technologies August 2010 Comparison of NAND Flash Technologies Used in Solid- State Storage By Shaluka Perera IBM Systems and Technology Group Bill Bornstein

Application Note 133 Eight Ways to Increase GPIB System Performance Amar Patel Introduction When building an automated measurement system, you can never have too much performance. Increasing performance

Technical White Paper Report Best Practices Guide: Violin Memory Arrays With IBM System Storage SAN Volume Control Implementation Best Practices and Performance Considerations Version 1.0 Abstract This

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of

Connecting AMD Memory to a System Address Bus Application Note This document is intended to clarify how memories may be connected to a system address bus and how software should issue device commands to

: Moving Storage to The Memory Bus A Technical Whitepaper By Stephen Foskett April 2014 2 Introduction In the quest to eliminate bottlenecks and improve system performance, the state of the art has continually

Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

In-System Design TM February 2002 Introduction In-system programming (ISP ) has often been billed as a direct replacement for configuring a device through a programmer. The idea that devices can simply

Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.