Wikipedia has SSD as Solid-state drive. I have also seen reference as solid-state disk.
But there is no drive and no disk, so I am calling it solid-state-device.

Update 2011-10

Solid-State Devices (not disks, not drives) Technology

After years of anticipation and false starts, the SSD is finally ready to
take a feature role in database server storage.
There were false starts because NAND flash is very different from hard disks
and cannot be simply dropped into a storage device and
infrastructure built around hard disk characteristics.
Too many simple(ton) people became entranced on only seeing the featured specifications
of NAND-based SSD, usually random IOPS and latency.
It is always the details in small print (or outright omitted) that are critical.
Now, enough of the supporting technologies to use NAND-based SSD
in database storage systems are in place and more are coming soon.

Random IO performance has long been the laggard in computer system performance.
Processor performance has improved along the 40% per year rate of
Moore's law. Memory capacity has grow at around the 27% per year rate
(memory bandwidth has kept pace, but not memory latency).
Hard disk drive capacity for a while grew at 50%-plus per year.
Even HDD sequential transfer rates has increased at a healthy pace,
from around 5MB/s to 200MB/s over the last 15 years.
However random IOPS has only tripled over the same 15 year period from 5400RPM to 15K.
The wait for SSD to finally break the random IOPS stranglehold has been long,
but is finally taking place.

We should expect three broad lines of progress in the next few years.
One is the use of SSD to supplement or replace HDD is key functions.
Second is a complete redesign of storage system architecture around
SSD capabilities with consideration that high-capacity HDD is still useful.
Third, that it is time to completely rethink the role of memory and storage
in server system architecture, and perhaps database architecture
with respect to data and log.

A quick survey of SSD products is helpful to database professionals because of
the critical dependency on storage performance.
However, it quickly becomes apparent that it is also necessary to provide
at minimum a brief explanation of the underlying NAND flash, including the
proliferations SLC, MLC and eMLC.
Next are the technologies necessary to implement high-performance storage
from NAND flash.
The Open NAND Flash Interface ONFI
industry workgroup is important in this regard.
This progresses to the integration of SSD in storage systems,
including form factor and interface strategies.
From here were can form a picture of the SSD products available,
and develop a plan to implement SSD where appropriate.

Non-Volatile Memory

To take the place of hard drives in a computer system,
the storage technology
prefers non-volatile memory,
in which information is retained on power shutdown.
Of the NV-memory technologies, NAND flash is most prevalent in hard-disk
alternative/replacement storage devices.
NOR flash has special characteristics, suitable for in-place code execution.
Other non-volatile memories include
Magneto-resistive RAM,
Spin-Torque Transfer,
and Memristor.
Phase-Change Memory
has promise in low granularity, and lower read latency.

NAND Flash

The Micron NAND
website is a good source of information on NAND.
Wikipedia has a description of
Flash Memory,
explaining the fundamentals and the difference between NAND and NOR.
The diagrams below from Cyferz
show NOR wiring on the left and NAND wiring on the right.

A key difference is that NAND has fewer bit (signal) and ground lines,
allowing for higher density, hence lower cost per bit
(well today it does not make sense to talk about price per bit,
so price per Gbit helps eliminate the leading zeros.

Multi-Level Cell

Sometime in 1997?, Intel published a paper on multi-level cell for NOR Flash,
called StrataFlash.
At some point, MLC made its way to NAND supporting 2-bits per cell.
There is currently a 3-bit cell in development,
but this may be more for low performance applications.
MLC has significantly longer program (write) time than SLC.

Numonyx SLC and MLC NAND Specifications

Numonyx (now Micron) has public specification sheets for their NAND chips.

Organization

x8

x16

Page Size

Type

Density

Page

Size

Block

Spare

Page

Spare

Block

Spare

Small page

SLC

128M-1G

512 byte

16b

16K

512

256 words

8 words

8K word

256 word

Large page

SLC

1G-16G

2 Kbyte

64b

128K

4K

1K words

32 words

64K word

2 Kword

Very Large page

SLC

8G-32G

4 Kbyte

128b

256K

8K(?)

&nbsp

&nbsp

&nbsp

&nbsp

Very Large page

MLC

16G-64G

4 Kbyte

224b

512K

28K

&nbsp

&nbsp

&nbsp

&nbsp

Type

Density

Random Access

Page Program

Block erase

ONFI

SLC

128M-1G

12μs

200μs

2ms

?

SLC

2-16G

25μs

200μs

1.5ms

1.0

SLC

8-64G

25μs

500μs

1.5ms

?

MLC

16-64G

60μs

800μs

2.5ms

?

A time for each subsequent byte/word is cited as 25 ns, for a 40MHz clock frequency.
SLC is typically rated for 100K cycles, and MLC for 5,000 cycles.
The (older) lower capacity SLC chips have 512 byte pages.

NAND Organization

I am not sure about this, but I understand the NAND chip itself could be referred to as a target
and is also a Logical Unit.
A single package could have one or more (up to 8?) die, hence is each die is addressed by the LUN?
The chip is divided into planes, the die in the above pictures have 4 or 8 planes?
which may also be a logical unit? or is a logical unit below a plane?
Below a logical unit plane(?) is a block, and then the page.
NAND organization: plane? logical unit (chip?), 2 planes
(may support interleaved addressing),
block, page. Target is one or more LU.

Block Erase, Garbage Collection and Write Amplification

After NAND became the solid-state component of choice, the industry started to learn
the many quirks and nuances of NAND SSD behavior.
NAND must be erased an entire block at a time (2,000μs?).
A write (or program) must be to an erased block.

The block write requirement has significant impact write performance.
Write to MLC is far slower than for SLC. Write performance issues of MLC can be solved with over provisioning.

The Wikipedia
Write Amplification
explain in detail on the additional write overhead due to garbage collection.
Write Amplification = Flash Writes/Host Writes.
Small random writes increases WA.
Write amplification can be kept to a minimum with over-provisioning.

The block write requirement has significant impact write performance.
Writes to SLC was already not fast to begin with,
writes to MLC is much slower than SLC (800 versus 200-500μs)
and on top of this, the implication of the block erase requirement
can result in erratic write performance depending on the availability
of free blocks.
The write performance issues caused by the block erase requirement
can be solved with over provisioning.

NAND SSD may exhibit a "bathtub" effect in read-after-write performance.
The intuitive expectation is that mixed read-write performance
should be close to a linear interpolation between the read and write performance specifications.
Without precautions, the mixed performance may be sharply lower than both
the pure read and pure write performance specifications.

Wear and MTBF

Flash NAND also has wear limits.
Originally this was 100,000 cycles for SLC and 5-10K for MLC.
The write longevity issues of MLC seem to be sufficiently solved with wear leveling
and other strategies.
SLC SSD may become relegated to a specialty market.

The fact that NAND SSD has a write-cycle limit suggests that database administration
could be adjusted to accommodate this characteristic.
If there were some means of determining that an SSD is near the write-cyle limit,
active data could be migrated off, and the SSD could be assigned to static data.
In an OLTP Database, tables could be partitioned splitting active and archival data.
In data warehouses, the historical data should static.

Flash Translation Layer

The characteristics of NAND flash such as block erasure and wear limits,
a simple direct mapping of logical to physical pages is not feasible.
Instead there is a Flash Translation Layer in between.
Numonyx provide a brief description
here.
The FTL is implemented in SSD controller(?), and determines the characteristics of the SSD.
Below is a block diagram of FTL between the file system and NAND.

Another diagram from the Micron/Numonyx
NAND Flash Translation Layer (NFTL) 4.5.0 document.
This document has a detailed description of the Flash Abstract Layer, or Translation Module
which incorporates functionality for bad block management, wear leveling and garbage collection.

The strategy for writing to NAND somewhat resembles the database log,
and the NetApp Write Anywhere File Layout (WAFL),
which is an indication that perhaps a complete re-design of the
database data and log architecture could be better suited to solid-state storage.

Error Detection and Correction

NAND density is currently at 128 or 256Gbit density per die for 2-bit cells,
meaning 64G cells. This is 16GB on one die! SLC is now at 128Gbit?
(Never mind, apparently the Numonyx SLC 64Gbit product is 8 x 8Gbit die stacked.
Still very impressive at both the die and package level.)
One aspect of such high densities is that bit error rates are high.
All (high-density?) NAND storage require sophisticated error detection and correction.
The degree of EDC varies for enterprise and consumer markets.

High Endurance Enterprise NAND

The Micron website describes High-Endurance NAND as

"Enterprise NAND is a high-endurance NAND product family optimized for intensive enterprise applications.
Breakthrough endurance, coupled with high capacity and high reliability
(through low defect and high cycle rates),
make Enterprise NAND an ideal storage solution for transaction-intensive data servers and enterprise SSDs.

Our MLC Enterprise NAND offers an endurance rate of 30,000 WRITE/ERASE cycles,
or six times the rate of standard MLC, and SLC Enterprise NAND offers 300,000 cycles,
or three times the rate of standard SLC.
These parts also support the ONFI 2.1 synchronous interface,
which improves data transfer rates by four to five times compared to legacy NAND interfaces."

Enterprise MLC is available upto 256Gbit, and SLC to 128Gbit.
I will try to get more information on this.

eMMC?

Below is an interesting combination of SLC and MLC.

Open NAND Flash Interface

ONFI "define standardized component-level interface specifications as well as
connector and module form factor specifications for NAND Flash."

In the original ONFI specification, the NAND array had parallel read
that could support 330MB/s bandwidth (8KB in 25us) with SLC?,
but the interface bandwidth was 40MB/sec
(the slidedeck mentions 25ns clock, corresponding to 40MHz,
but the ONFI webite says 1.0 is 50MB/s).
Then accounting Array Read and Data Output is 25 + 211us for SLC
and 50 + 211us for MLC for net bandwidth 34 and 30MB/s.
Net write bandwidth is 17MB/s and 7MB/s respectively.
Below is the single channel IO.

&nbsp

&nbsp

&nbsp

Read

Write

Device

Planes

DataSize

ArrayRead

DataOutput

TotalRead

DataInput

ArrayProgram

TotalWrite

SLC 4KB page

2

8KB

25μs

211μs

34MB/s

211μs

250μs

17MB/s

MLC 4KB page

2

8KB

50μs

211μs

30MB/s

211μs

900μs

7MB/s

Note that the write latency is very high relative to hard disk sequential writes,
as is transaction log writes. I believe the purpose of DRAM cache on the SSD
controller is to hide this latency.

While the bandwidth and latency for NAND at the chip level is not spectacular,
both could be substantially improved at the device level
with more die per channel, more channels, or both as illustrated below.

Note:
I am puzzled by the tables below, I am thinking that the number of channels and the die per channels axis was inadvertently switched.
If the signalling bandwidth is 40MB/s, then 4 channels is required for a maximum of 160MB/s,
but it does take multiple die per channel to reach the channel bandwidth of 40MB/s.

SLC 2-Plane Performance: Die per channel vs. # of channels

&nbsp

Read

Write

# of channels

1

2

4

8

1

2

4

8

1 die per ch

34

40

40

40

19

38

40

40

2 die per ch

68

80

80

80

38

76

80

80

4 die per ch

136

160

160

160

76

152

160

160

MLC 2-Plane Performance: Die per channel vs. # of channels

&nbsp

Read

Write

# of channels

1

2

4

8

1

2

4

8

1 die per ch

30

40

40

40

7

14

28

40

2 die per ch

60

80

80

80

14

28

56

80

4 die per ch

120

160

160

160

28

56

112

160

SLC could achieve near peak performance with 4 channels and 2 die per channel.
MLC could also achieve peak read performance with 4 channels and 2 die per channel,
but peak write performance required 8 die per channel.

ONFI 2.x Specification

ONFI 2.0 defines a synchronous interface, improving IO channel to 200MB/sec
and allowing 16 die per channel.
The version 2.0 (2008) allowed speeds greater than 133MB/s.
Version 2.1 (2009) increased this to 166 & 200MB/s, plus
other enchancements, including in ECC.
(The current Micron NAND parts catalog list 166MT/s as available).
Read performance is improved for a single die and for multiple die.
Write performance did not improved much for a single die, but did for multiple die on the same channel.
Version 2.2 was other features.
ONFI 2.3 add EZ-NAND to offload ECC responsibility from the host controller.

Almost all SSDs on the market in 2010 are ONFI 1.0.
SSDs using ONFI 2.0 are expected soon(?) with >500MB/s capability?

ONFI 3.0 Specification

The future ONFI 3.0 with increase the interface to 400MT/s.

Non-Volatile Memory Host Controller Interface

The existing interfaces to the storage system today were all designed
around the characteristics of disk drives, naturally
because the storage system was comprised disk drives.
As expected, this is not the best match to the requirements and features
of non-volatile memory storage.
The Non-Volatile Memory Host Controller Interface
(NVMHCI)
specification will define
"a register interface for communication with a non-volatile memory subsystem"
and "also defines a standard command set for use with the NVM device."
NVMHCI specification should be complete this year, with product in 2012.

A joint Intel and IDT presentation by Amber Huffman and Peter Onufryk
at Flash Memory Summit 2010 discusses Enterprise NVMHCI.
In the storage system today, there is a controller on the hard drive
(the chip on the hard drive PCB),
with an SAS or SATA interface to the HBA.

The argument is that the HBA and controller should be integrated
into a controller on the SSD, with PCI-E interface upstream.
Curiously, IDT mentions nothing about building natie PCI-E flash controller,
considering that they are a specialty silicon controller vendor.

Below is the Enterpise NVMHCI view.
The RAID controller now has PCI-E interfaces on both upstream and downstream sides.
I had previously proposed that RAID functionality should be pushed in to
the SSD itself.

SSD with PCI-E Interface

Kam Eshghi also of Integrated Device Technology
has a FMS 2010 presentation "Enterprise SSDs with Unrivaled Performance A Case for PCIe SSDs"
endorsing the PCI-E interface.
The diagrams below are useful to illustrated the form factor.
Below is a RAID PCI-E implementation using a standard RAID controller with PCI-E on the front-end
and SATA or SAS on the back-end, an Flash controller with a SATA interface, and NAND chips.

In the next example, the host provides management services, consuming resources,

The desire to connect solid-state storage directly to the PCI-E interface is understandable.
My issue is the current standard PCI-E form factor is not suitable for easy access.
There is the Compact PCI form factor (not yet defined for PCI-E?)
where the external and PCI connections are
at opposite ends of the card, instead of at two adjacent sides.
This would be much more suitable for storage devices.
Some provision should also be made from greater flexibility in storage capacity expansion
with the available PCI-E ports.

SSD SATA/SAS Form Factor

There is a joint presentation by LSI and Seagate arguing that the SAS/SATA
interface does limit SSD performance, and has excellent infrastructure
for module expansion and ease of access.

The current trend with SSD with SATA/SAS interfaces is the 2.5in HDD form factor.
The standard 3.5in HDD form factor is far too large for SSD.
For that matter, the 3.5in form factor has become too big for HDD as well.
The standard defined heights for 2.5in drives are 14.8mm, 9.5mm, and 7mm.
Only enterprise drives now use the 14.8mm height, as notebook drives are all 9.5mm or thinner.

(Update)
Apparently Oracle/Sun has already implemented the high-density strategy.
The F5100 implements upto 80 Flash Modules (2.5in, 7mm form factor, SATA interface)
in a 1U enclosure for 1.92TB capacity.
I suppose the Flash Modules are two deep.
A hard drive enclosure is already heavy enough with 1 rank of disks,
but 2 deep for a flash enclosure is very practical.
And to think there are still storage vendors peddling 3U 3.5in enclosures!

Gary Tressler of IBM proposes that SSD should actually adopt the 1.8 form factor.
Presumably there would only be a single SSD capacity.
The storage enclosure with have very many slots,
and we could just plug in however many we need to.

SSD Controllers Today

I believe STEC is one of the component suppliers for Enterprise-grade SSD,
especially with SAS interface, while most SSDs are SATA.
EMC just announced Samsung as a second source.
SandForce seems to be a popular SSD controller source for many SSD suppliers.

SandForce SSD Processor

SandForce
makes SSD processors used by several SSD vendors.
The client SSD processor is the SF-1200.
Random Write IOPS is 30K for bursts, 10K sustained, both at 4K blocks.
The SF-1500 is the enterprise controller.
The performance numbers are similar.
Both support ONFI 50MT/s, SATA 3Gbps and can correct 24 bytes (bits?)
per 512-byte sector.
The SF-1500 is listed as also supporting eMLC,
has unrecoverable read errors less than 1 in 1017,
with Reliability MTTF 10M operating hours
and supports 5-year enterprise life cycle (100% duty).
The SF-1200 has unrecoverable read errors less than 1 in 1016,
reliability MTTF is 2M operating hours
and supports 5-year consumer life cycle with 3-5K cycles.

SSD vendors with the SandForce processor include Corsair and OCZ.

SandForce SF-2500 & SF-2600 Enterprise SSD Processors

The new SandForce 2000 processor line became available in early 2011.
The SF-2000 series supports the ONFI 2 166MT/s.
The Enterprise processor is the SF-2500 & 2600 line.
SATA 6Gbps and below are supported. The SF-2500 is SATA, supporting only 512B sectors?
The SF-2600 also supports 4K sectors, has a SATA interface,
but can work behind a SAS/SATA bridge.