Using Solid State Disks in SQL Server Storage Solutions

With the cost of solid state disks (SSDs) going down, many organizations are considering using them. When you use SSDs, you need to change the way you think about designing storage for production SQL Server servers. SSDs use the same basic configurations for storage that you've always used, such as separate disks for data files, transaction logs, and tempdb; RAID 5 for the databases; and RAID 10 for the transaction logs and tempdb. However, you don't necessarily need to use the same design techniques when working with SSDs.

SSDs can be used in a variety of ways within the storage array, depending on the options provided by the storage array vendor. You can:

Add the SSDs as a portion of the storage within an auto-tiering solution

Use the SSDs as a second tier of cache, which is sometimes called flash cache

Use the SSDs as traditional disks

In addition to being aware of these design techniques, you need to know about a potential bottleneck and your options when configuring local SSDs for a server.

SSDs in Auto-Tiered Storage Solutions

Many storage arrays include auto-tiering technology that stores the data being accessed the most on the fastest storage available. By introducing SSDs into the auto-tiered solution, the "fastest storage possible" becomes much faster. Because of this, the storage tiers that are accessed the most will have their requests serviced much faster than if SSDs weren't being used.

Knowing how much space is needed is the most important part of working with SSDs in auto-tiered storage solutions. The easiest metric to use is to look at the daily amount of data change for all the servers that are configured to use the auto-tiered storage. As long as the amount of SSD storage is a little larger than the amount of data that's changing per day on the storage array, all of the most recently used data for that day should remain within the SSD storage.

Flash Cache

Another way that you can use SSDs in a storage array is a design technique often called flash cache. When configured as a flash cache, the SSD becomes a second layer of caching. Although it's slightly slower than the normal layer of caching within the array, it's much faster than the traditional spinning disks on which the data usually resides. This second layer of caching is great for systems that access the same blocks of storage often, then quickly change to accessing another set of blocks often, such as an OLTP database application.

Applications that don't touch the same blocks of storage over and over again might not make the best use of a flash cache. Under normal circumstances, SQL Server falls into this category. SQL Server should access a block once, then keep that block cached until the block ages out of the cache.

However, SQL Server can make good use of the flash cache if it needs to write large amounts of data at once, provided that the flash cache is enabled at the LUN level and can be used for writes. The array will cache the written data on the SSDs if the amount of data that's being written at once is larger than the amount of data that can fit within the normal cache of the storage array. SQL Server will then be told that the data has been written to disk (just like with a normal cache). The storage array will later flush the data in the SSDs to the traditional spinning disks.

When using a flash cache within a storage array, the cache miss ratio might go up quite a bit. Storage administrators will need to keep this in mind when watching the performance metrics of the storage array.

All Flash All the Time

The most expensive and least flexible way to provision SSDs within a storage array is to configure them just like you would configure traditional spinning disks. Using this method, LUNs would be created on the SSDs and presented to servers.

If some portions of your database lie idle, putting the entire database on an SSD doesn't make the most sense from a cost perspective. However, some applications might be so important to the company that they warrant the extra expense of purchasing SSDs for them.

Potential Bottleneck

When you use SSDs in a storage array, there's a potential bottleneck that needs to be addressed. Because the storage has now become that much faster, great care needs to be taken when you configure the physical servers. You need to plug the host bus adapters (HBAs) or iSCSI network cards into the correct slots to get the maximum throughput possible. As Figure 1 shows, as the PCI technology has improved, the number of gigabits per second that can be pushed through the various slots has increased.

PCI Slot Performance

When installing the physical cards in the server, you need to keep in mind that not every slot will have the maximum amount of available bandwidth. Slots on servers' motherboards often share a back-end bus and therefore the available bandwidth has to be shared across all the cards on the back-end bus.

For example, suppose your server has two PCI Express 3.0 (x1 Link) slots on the motherboard. Each slot is capable of up to 7.69 gigabits per second. However, if both slots are on the same back-end bus and you install one HBA in each slot, the total amount of bandwidth available to both cards will be only 7.69 gigabits. If you're using 4-gigabit HBAs, this won't be much of a problem. However, if you're using 8- or 16-gigabit HBAs, you could see performance problems if you push both HBAs above 50 percent at the same time because the back-end bus would be at capacity. The only way to check for problems like this is to use diagnostics provided by the motherboard vendor or by manually looking at the server's motherboard to see how things are connected and then doing the math to see if the back-end bus is at capacity.

Local SSD Configuration

There are a couple of different ways that local SSDs can be configured for a server. They can be local disks that fit within the drive cage in which the traditional spinning disks are installed, or they can be PCI cards that are plugged directly into the motherboard.

SSDs in a drive cage. SSDs that fit within a drive cage (which is usually on the front of the server) are easy to install and configure because they're installed and configured like a traditional spinning disk. Replacing an SSD in a drive cage is also easy. You just need to remove it and install a new one. If the server is hot swappable, you don't even need to take the server offline.

The biggest downside to installing the SSDs in a drive cage is the amount of bandwidth available to the disks. Typically, the drive cage on a server will have either one or two connections to the motherboard (depending on the motherboard design). These connections will typically be external SATA (eSATA), which is limited to 6 gigabits per second for all the disks that are connected to the link within the cage. Because of this limitation, it doesn't really matter how fast the drives are because the total amount of bandwidth available would be the 6-gigabit limit. Even if you have two links available and the SSDs are balanced across both links, the total bandwidth available is limited to just 12 gigabits per second (6 gigabits per second per link). Given the speed of most SSDs that are available today, maxing out the 6-gigabit link can be done with just one or two SSDs per link. (Note that this might vary, depending on the drive and storage array purchased and other variables.)

Disks that are connected through the traditional drive cage are connected to the server through the normal RAID controllers within the server. As a result, all disk redundancy functions are handled through the RAID controller, which means that the disk redundancy is configured outside of Windows and is totally invisible to the Windows OS.

PCI cards. When dealing with local disks, another option is to use PCI card–based SSDs. The PCI cards connect directly to the motherboard through PCI slots, so they have access to much more bandwidth than SSDs connected through the local drive cage.

The biggest downside to PCI slot–based SSDs is that there's currently no hardware RAID configuration available for these PCI cards. This means that each card is presented to the Windows OS as a separate disk. Thus, you need to configure a software RAID solution within Windows using disk mirroring or disk striping to provide redundancy in order to keep the system running if a drive fails.

What SSDs Can and Can't Do for You

The biggest benefit from introducing SSDs into an environment is the performance improvement seen in the server's I/O subsystem. There can be other related benefits as well. Because the I/O is so much faster, the server might be able to run with less memory and therefore have a smaller buffer pool. If you're running SQL Server Standard and you need more memory than its limit of 64GB of RAM, it might be less expensive to install SSDs than to upgrade to SQL Server Enterprise and purchase more RAM. Similarly, if you need to use the snapshot isolation level, which will push the I/O requirements of the tempdb database higher (potentially much higher, depending on the SQL Server load), SSDs might be a way to get the needed performance for the tempdb database.

Although SSDs can provide performance benefits, some people think that they'll solve all their SQL Server performance problems. Sadly, they won't. SSDs don't change the way the SQL Server's optimizer works. Bad code will still generate bad plans, which will then stress the storage subsystem. The storage subsystem will be able to respond faster to these requests, which will effectively mask the problem, but it won't make the problem go away.

When it comes to performance troubleshooting with SSDs installed in the server, all of the normal rules and techniques apply. You need to find and fix the bottleneck. The biggest difference between SQL Server with SSDs and SQL Server without SSDs is that the one with SSDs has already had the slowest part upgraded -- the storage subsystem.