Today I was browsing through a book on SQL Server optimization and it seemed that a certain amount of the ideas were based on a linear model of storage. As SSDs have a completely different storage model, do they in any way change the game in regards to how one thinks about database tuning or optimization?

6 Answers
6

Yes, they do change the game. Optimizations based on the characteristics of spinning magnetic disks (like seek time and rotational delay) may not be relevant on SSD drives. A recent paper* published in FITME 2010 presents one new query optimization algorithm based on the characteristics of SSDs.

However, these changes will probably be low-level changes (to storage and retrieval algorithms, for instance) that can be implemented effectively by database developers. They probably won't affect database users that much.

Yes - but most database optimisations already went away when we just put everything into ram. Once 64Gb of RaM got cheaper than a SQL expert things already changed, not sure how much SSD adds to that
–
Martin BeckettMay 4 '11 at 18:13

2

@Martin agreed. On the other hand, there's been a decided turn towards horizontal (cloud, etc) rather than vertical (monstrous $500k DB boxes) scaling recently. Distributed systems can get global non-linear performance improvements from this sort of local linear optimization. This can often be a better cost model as well.
–
Rein HenrichsMay 4 '11 at 18:21

Performance

SSDs are performant: they do not have to seek, and throughput is blazing. Most software dealing with disks, to the extent they are optimized, are optimized to reduce the number of synchronous seeks. In so doing, they introduce hosts of complexities. With the advent of fast, seekless writes to persistent storage, new data storage systems will no longer require such complexities.

Durability

SSDs currently have high failure rates. Your SSD will fail. Your SSDs will fail at a much higher rate than magnetic disks. You must work around this with replication, backups, etc. This introduces its own set of complexities.

Um, what? SSDs have high failure rates? The annual failure rates for SSDs are significantly less than HDDs. So far few people have managed to exhaust available writes on SSDs, especially with more advanced controllers (LSI's SandForce for example).
–
Mircea ChireaMay 22 '12 at 10:17

The overall reduction in price of storage has much more profound effects.

Before we had SQL, we had super-optimized hierarchical and network databases where DBA's had to carefully plan out track and cylinder placement of data.

SQL databases are much less efficient. But now that disks are cheap, huge and fast, we barely care.

NoSQL ("Document") databases can be somewhat less efficient than SQL because there isn't the same capability of logical-to-physical mapping between the SQL logical schema and the underlying physical schema of files or tablespaces or whatever. And we barely care.

The SSD performance improvements are likely to be lost in the changes caused by use of NoSQL databases to the way we architect systems overall.

The main issue with optimizing anything for SSDs has to do with how they write data. A traditional hard drive typically stores data in small sectors of around 512 bytes and can actually manipulate sectors directly at or even below that level.

SSDs have some drawbacks with regard to writes:

A minimum block write size of around 4-8KB.

Writes can only be performed on a full page basis of typically 256KB.

Only empty blocks can be written to.

A typical nightmare scenario, referred to as Write amplification, is when you want to write a single byte to a location on disk that has some blocks already in use. In order to write there, you need to first copy the entire 256KB page into memory, erase the entire block, change the single byte in the page, then write back the entire modified 256KB page. So in order to write a single byte, there's been about half a megabyte of "traffic"!

There are lots of optimizations for this problem implemented at the SSD, controller and even operating system level, but undoubtedly DBMSs can benefit by tailoring these optimizations to their specific workings.

This is however not something database users (as in, using a database in their application) need to think about, since it will be highly dependent on design/implementation decisions at the DBMS level.

From what I gather from the ServerFault blog, database servers have to have burly hardware. The database server of the stack exchange sites is running SSD's (see http://blog.serverfault.com/post/our-storage-decision/) and I would imagine that query optimization is still very much needed. CPU and memory are impacted by database queries as well as IO.

However, database performance does depend on IO a great deal, so SSD's would sure help.

I was listening to a podcast saying that large chunks of RDBMSs like Oracle, SQL Server etc will start being "optioned out" if they can work out the seperation properly. Detect if its an SSD drive and optimise accordingly.

There is a lot of extra code built into the caching and writing of data that simply isn't required anymore.

Even more interestingly is the RAMSAN and its variants. Basically an hard disk drive made out of RAM chips with a built in X hour UPS and the ability to background write to longer term HDD storage.