Let’s look at the problem in the real world. You, me and our best friend have decided to start making SSDs. We buy up some NAND-flash and build a controller. The table below summarizes our drive’s characteristics:

Our Hypothetical SSD

Page Size

4KB

Block Size

5 Pages (20KB)

Drive Size

1 Block (20KB

Read Speed

2 KB/s

Write Speed

1 KB/s

Through impressive marketing and your incredibly good looks we sell a drive. Our customer first goes to save a 4KB text file to his brand new SSD. The request comes down to our controller, which finds that all pages are empty, and allocates the first page to this text file.

Our SSD. The yellow boxes are empty pages

The user then goes and saves an 8KB JPEG. The request, once again, comes down to our controller, and fills the next two pages with the image.

The picture is 8KB and thus occupies two pages, which are thankfully empty

The OS reports that 60% of our drive is now full, which it is. Three of the five open pages are occupied with data and the remaining two pages are empty.

Now let’s say that the user goes back and deletes that original text file. This request doesn’t ever reach our controller, as far as our controller is concerned we’ve got three valid and two empty pages.

For our final write, the user wants to save a 12KB JPEG, that requires three 4KB pages to store. The OS knows that the first LBA, the one allocated to the 4KB text file, can be overwritten; so it tells our controller to overwrite that LBA as well as store the last 8KB of the image in our last available LBAs.

Now we have a problem once these requests get to our SSD controller. We’ve got three pages worth of write requests incoming, but only two pages free. Remember that the OS knows we have 12KB free, but on the drive only 8KB is actually free, 4KB is in use by an invalid page. We need to erase that page in order to complete the write request.

Uhoh, problem. We don't have enough empty pages.

Remember back to Flash 101, even though we have to erase just one page we can’t; you can’t erase pages, only blocks. We have to erase all of our data just to get rid of the invalid page, then write it all back again.

To do so we first read the entire block back into memory somewhere; if we’ve got a good controller we’ll just read it into an on-die cache (steps 1 and 2 below), if not hopefully there’s some off-die memory we can use as a scratch pad. With the block read, we can modify it, remove the invalid page and replace it with good data (steps 3 and 4). But we’ve only done that in memory somewhere, now we need to write it to flash. Since we’ve got all of our data in memory, we can erase the entire block in flash and write the new block (step 5).

Now let’s think about what’s just happened. As far as the OS is concerned we needed to write 12KB of data and it got written. Our SSD controller knows what really transpired however. In order to write that 12KB of data we had to first read 12KB then write an entire block, or 20KB.

Our SSD is quite slow, it can only write at 1KB/s and read at 2KB/s. Writing 12KB should have taken 12 seconds but since we had to read 12KB and then write 20KB the whole operation now took 26 seconds.

To the end user it would look like our write speed dropped from 1KB/s to 0.46KB/s, since it took us 26 seconds to write 12KB.

Are things starting to make sense now? This is why the Intel X25-M and other SSDs get slower the more you use them, and it’s also why the write speeds drop the most while the read speeds stay about the same. When writing to an empty page the SSD can write very quickly, but when writing to a page that already has data in it there’s additional overhead that must be dealt with thus reducing the write speeds.

I bought a Vertex 120GB and it is NOT working on my Nvidia chipsets motherboard. Anyone met the same problem? I tried intel chipsets motherboard and seems ok.
I used HDtach to test the read/write performance 4 days ago, wow, it was amazing. 160MB/s in write. But today I felt it slower and used HDtach to test again, it downs to single digit MB per second. Can I recover it or I need to return it? Reply

While I must admit I skipped over some of the more technical bits where SSD was explained in detail, I read the summaries and I've gotta admit this article was extremely helpful. I've been wanting to get one of these for a long time now but they've seemed too infantile in technological terms to put such a hefty investment in, until now.

After reading about OCZ's response to you and how they've stepped it up and are willing to cut unimportant statistics in favor of lower latencies, I actually decided to purchase one myself. Figured I might as well show my appreciation to OCZ by grabbing up a 60GB SSD, not to mention it looks like it's by far the best purchase I can make SSD-wise for $200.

Thanks for the awesome article, was a fun read, that's for sure. Reply

Anand, I don't want to sound too negative in my comments. While I wouldn't call them unusable, there's no doubt that the random write performance of the JMicron SSDs sucks. I'm glad that you're actually running random I/O tests when so many other websites just run HDTune and call it a day.

Something crossed my mind when I saw the firmware-based trade-off between random writes and sequential transfer rates: couldn't that be adjusted dynamically to get the best of both worlds? Default to the current behaviour but switch into something resembling te old one when extensive sequential transfers are detected?

Of course this neccesiates that the processor would be able to handle additional load and that the firmware changes don't involve permanent changes in the organization of the data.

Maybe the OCZ-Team already thought about this and maybe nobody's going to read this post, buried deep within the comments..

That was kind of strange to me too. But I assume Anand really means the desktop market, not the server storage/business market. Since it's highly doubtful that the general consumer will spend many times as much money for 15k SAS drives. Reply

I've always been someone who wants real clarify and truth to the information on the internet. That's a problem because probably 90% of things are not. But Anand is one man I feel a lot of trust for because of great and complete articles such as this. This is truly the first time that I feel like I really understand what goes into ssd performance and why it can be good or bad. Thank you so much for being the most inciteful voice in the hardware community. And keep fighting those damn manufacturers who are scared of the facts getting in the way of their 200MB/s marketing bs. Reply