White Papers

SanDisk® and DB Best Technologies collaborated to quantify the impact of high-performance storage and caching on both VM density and the cost of software licensing for hosts. This whitepaper presents two solutions, one using caching software with flash storage to accelerate the host without any changes to the existing storage infrastructure, and another that moves the database to flash storage, in comparison to a baseline configuration.

Executive Summary

Workload consolidation increases the efficiency of IT organizations as well as cloud and hosting providers, by harnessing the increasing power of modern host servers to support an increasing number of workloads. Increasing the workload density – the number of workloads running on a host server – drives the economics of consolidation, by reducing the number of host servers required to run a given number of workloads.

Reduce the number of host servers needed, and you reduce both hardware and software license costs.

The solutions detailed in this whitepaper demonstrate a 3x increase in workload density, delivering a ~65% savings totaling ~$460,000 in hardware and software license cost for the scenario tested.

One solution adds caching of “hot data” locally in the host server, requiring no changes to the existing storage environment. This solution uses FlashSoft® software for caching from SanDisk combined with Fusion ioMemory™ from SanDisk, delivering a 66% savings totaling $467,850, for a 79x ROI.

These savings figures assume the purchase of new host servers. These same solutions used in existing host servers will also deliver a 3x increase in workload density, delaying or avoiding the need to upgrade or replace existing storage solutions and host servers.

Customers’ workloads and environments vary significantly, so a careful assessment of each environment is recommended and will quantify the savings and specific solutions recommendation.

SanDisk® is a global leader in flash storage solutions. For more than 27 years, SanDisk has expanded the possibilities of storage, providing trusted and innovative products that have transformed the electronics industry. SanDisk offers a range of flash-based enterprise products and systems in addition to the FlashSoft software and Fusion ioMemory products used in these solutions.

DB Best Technologies, Microsoft’s dominant database modernization, migration, and optimization partner, collaborated with SanDisk on this project, and knows these solutions well. DB Best’s DBMSYS modeling is purpose-built to assess customer environments and recommend the optimal modernization and migration outcomes.

The Problem: SQL Server Workload Consolidation

Many IT organizations strive to maximize utilization of their existing hardware investments by virtualizing workloads, and gain additional advantages of simplified management and lower operating cost. Gartner reports1 that enterprises have virtualized over 75% of x86 server instances. The number of virtualized SQL Server workloads that can be consolidated onto a given host – “workload density” – depends on many factors, including CPU performance, the amount of system memory, etc.

A common limiting factor in workload consolidation is storage performance, as detailed in this whitepaper.

SanDisk, a leader in data center flash hardware and software solutions, and DB Best Technologies, Microsoft’s dominant database modernization, migration, and optimization partner, collaborated to quantify the impact of high-performance storage and caching on both VM density and the cost of software licensing for hosts.

The cost of a host server comes from its components and the software licenses required to run the desired workloads. Component costs include the host server itself, CPUs, system memory, network cards, local storage, etc. Software licenses are needed for the operating system and any applications that will run on the host. The focus of this whitepaper is Microsoft SQL Server, so licenses for Windows Server and SQL Server itself are required.

Software license costs are critical to consider when evaluating potential savings, as shown in Appendix B – Bill of Materials for Test System, which provides the retail cost of the host server configurations used during this testing. The software license cost dominates the total system cost, primarily the cost of SQL Server per-core licenses, which account for about 80% of the total system cost. If you can get by using fewer cores, you’ll need fewer per-core licenses, and can realize significant savings.

Workload density – the number of workloads a host can support – is a key metric for SQL Server workload consolidation scenarios. If a host can support more workloads at an acceptable performance level, you’ll need fewer hosts to run all your workloads, and fewer software licenses.

The scenario tested used up to 25 SQL Server transactional workloads, a typical consolidation ratio to 1) increase operational efficiency; or 2) manage the SQL Server 2005 End of Support in April 2016, by combining workload virtualization with migration and upgrade to SQL Server 2014.

This test scenario is applicable to projects migrating workloads from physical servers, virtualizing workloads and consolidating to a virtualized host, or increasing the consolidation of workloads already virtualized.

The two solutions demonstrate a 3x increase in workload density, after adding SanDisk products to the virtualized host in two different configurations.

Increasing the workload density by 3x can result in substantial savings – fewer hosts actually can get more work done, while reducing spending on host servers, infrastructure, and software licenses.

This whitepaper models only potential cost savings for the specific storage configurations and scenarios tested. For authoritative guidance for your environment, a custom assessment may be required. DB Best has performed hundreds of these assessments.

The Solutions: Do More, Save More

This whitepaper presents two solutions, one using caching software with flash storage to accelerate the host without any changes to the existing storage infrastructure, and another that moves the database to flash storage, in comparison to a baseline configuration. Each of the SanDisk solutions delivered a 3x increase in workload density compared to the baseline configuration.

Baseline Configuration

The baseline test configuration described in this whitepaper uses a mainstream 2-socket server with 28 physical cores, 256GB of system memory and 24 x 15k RPM SAS HDDs. DB Best’s experience with hundreds of customer engagements shows this is a typical production system.

The baseline configuration is described in more detail in The Testing.

Do More – FlashSoft Software using Fusion ioMemory from SanDisk

The first solution uses FlashSoft software and Fusion ioMemory to cache “hot data” in the host server, close to the CPU, where it can deliver maximum performance. This is especially beneficial for large database-driven applications on shared storage infrastructures.

Fusion ioMemory offers top storage transaction rates for high performance applications with mixed read/write workloads, and uses proprietary technology such as Adaptive Flashback for durability.

Compared to the baseline system, this caching configuration requires installing and configuring the FlashSoft software for caching, installing the Fusion ioMemory card and VSL® software, and configuring the Fusion ioMemory card. These steps typically take about 60 minutes.

The FlashSoft software is designed to be transparent and non-disruptive to the existing application structure and policies of the installed baseline system, which is what we observed in our testing – no additional steps were taken other than to install the FlashSoft software, the Fusion ioMemory card, and VSL software.

Likewise, no changes were made to the hard drive storage, which continued to host all the data and software used in the test; the Fusion ioMemory card was used only for cached data managed by the FlashSoft software.

Conclusion: We observed a 3x increase in the number of workloads the host could support using this caching solution and 2x the transactions per second delivered by the SQL workloads, compared to the baseline configuration using only hard drive storage.

This is a great solution when the data needs to remain where it is currently stored, to keep costs down to maximize ROI, or for large datasets that exceed the capacity of the Fusion ioMemory cards used. This solution also extends the value of past investments in storage solutions while those investments are depreciated.

Do More – Fusion ioMemory from SanDisk

The second solution uses Fusion ioMemory PCIe add-in cards in the host server as primary storage, where it can deliver maximum performance. In this configuration, the Fusion ioMemory holds all the data and software used in the test.

Compared to the baseline system, this all-flash configuration requires installing the Fusion ioMemory card and VSL driver software, and configuring the Fusion ioMemory card. These steps typically take about 30 minutes.

The VHDX files holding the data and software originally stored on the baseline system’s hard drives, were moved to the Fusion ioMemory storage.

Conclusion: We observed a 3x increase in the number of workloads the host could support using the all-flash solution and 2.6x the transactions per second delivered by the SQL workloads, compared to the baseline system.

This is a great solution when the host can accommodate enough Fusion ioMemory capacity to hold all the databases and software used by the consolidated workloads (i.e., the VHDX files), and higher performance justifies the higher cost. Fusion ioMemory cards currently are available in capacities up to 6.4TB, and most host servers can accommodate several Fusion ioMemory cards.

Save More

Our testing showed the baseline configuration with hard disk drives supported seven of our test workloads, and both the caching and all-flash configurations supported about 3x more workloads, so for our savings calculations we will focus on 21 VMs, 7 times 3.

Appendix B – Bill of Materials for Test System shows that about 80% of the host server cost is for SQL Server per-core licenses, or $192,472 for each system.

A 3-host deployment of the baseline host server would cost about $710,625. The observed CPU utilization was about 65% in that configuration, running seven VMs delivering about 35,000 transactions per second.

Using the FlashSoft software caching solution, a single host can support all 21 VMs, and that host costs about $242,775 – a savings of $467,850. The observed CPU utilization was about 69% in that configuration, with 21 VMs delivering about 65,000 transactions per second.

Using the SanDisk Fusion ioMemory all-flash solution, a single host again can support all 21 VMs, and that host costs $257,406 – a savings of $453,219. The observed CPU utilization was about 89% in that configuration, with 21 VMs delivering about 80,000 transactions per second.

All three system configurations support the same 21 VMs.

Both SanDisk configurations save a lot of money, deliver higher CPU utilization to get more value from per-core license investments, and provide higher workload performance.

Hardware

For this testing we used two Dell R730xd servers, one had the hard drive and FlashSoft Software/Fusion ioMemory storage configurations installed, the other had only the Fusion ioMemory storage configuration installed.

SQL Server 2014 Enterprise Edition in the virtual machines was configured as follows:

Parameter name

Minimum

Maximum

Config value

Run value

Max. Degree of Parallelism

0

32767

1

1

Max. Server Memory (MB)

128

2147483647

2048

2048

Min. Server Memory (MB)

0

2147483647

0

16

Network Packet Size (B)

512

32767

4096

4096

Software Setup Recycling

To make repeated runs easier, all the test cycles were automated with respect to provisioning and cleaning up.

To account for normal variances, the tests were repeated 5 times for each scenario and averaged. In order to get a repeatable process, we needed a systematic way to reset the test environment and start all over. The following section describes this process.

Remove all VMs from the test host

Delete all VHDs on the test host storage

Create new VMs

Create 24 VMs on the all-flash Fusion ioMemory host, and 25 on the hard drive host (the difference between 24 and 25 VMs is due to a configuration anomaly, which was not researched further).

The Testing

We made a number of choices for our test configuration, all intended to represent the configuration of current best-selling servers frequently used as hosts for virtualized workloads.

Additional configuration details are provided in the Hardware and Software sections, above.

We configured a system to run Windows Server 2012 R2 in the Hyper-V role, where many virtual machine could be added one after the other, each running SQL Server 2014 with a transactional load generated by HammerDB running on a separate load injector system.

Each VM was allocated 2 virtual CPUs (vCPU), 16G RAM, and a 100GB SQL Server 2014 database. Tests were repeated for hard drive storage, hard drives with data caching on flash storage, and with all data on flash storage.

VMs were incrementally added and ran for 40 minutes, while collecting performance metrics from SQL Server, other software components, and hardware. Almost 500 runs were made during the course of this testing.

Benchmarking Testing Scenarios

The load injector used HammerDB and AutoHammer to subject each VM on the test host to a TPC-C-like workload.

Find The Optimal Number of HammerDB threads

To determine the optimal workload for all subsequent testing, we identified the point where additional HammerDB threads added no additional transaction performance: on both hard drives and SanDisk storage, CPU utilization reached 98% at 22 HammerDB threads. See Figure 3 – Thread Count at 98% CPU – Hard Drives, Fusion ioMemory and Caching.

Now we incrementally add VMs and let them run for 40 minutes, while collecting performance metrics from SQL Server, other software components, and hardware.

Test Results: Hard Drives and Multiple VMs

This test aims to find the limitations of the underlying physical storage system where the hard drives can no longer keep up with CPU and transaction burden.

Figure 4.Hard Drives and Multi-VM Transaction Performance

In Figure 4 – Hard Drives and Multi-VM Transaction Performance, the peak aggregate transaction performance is reached at around 7 VMs, with about 32,000 transactions per second. After 8 VMs or more, the performance gains are negative; 22 VMs yield no more work than 3 VMs.

Figure 5.Hard Drives and Multi-VM Transaction Performance and Latency

In Figure 5 – Hard Drives and Multi-VM Transaction Performance and Latency as the number of VMs increases, IO latency from the hard drives increases (green line, in ms), reaching 25ms at 8 VMs, and continues to increase as more VMs are added and aggregate performance declines.

CPU utilization declines as latency increases, as shown in Figure 6 – Hard Drives and Multi-VM Transaction Performance, Latency and CPU Utilization by the yellow line (%). Even at peak performance around 6-7 VMs, the CPU utilization is only around 65%.

In Figure 7 – Hard Drives and Multi-VM Transaction Performance, Latency, CPU Utilization and IOPS the light green line shows changes in total IOPS are less pronounced, but also decline starting at 8 VMs along with the transaction rate.

After adding caching to the test host with hard drives, we observed dramatic improvement in transaction rates and CPU utilization.

Figure 8.Hard Drives with 20% Cache Transaction Performance

As seen in Figure 8 – Hard Drives with 20% Cache Transaction Performance, the caching configuration tops out around 65,000 transactions per second, almost twice the performance of the hard drive configuration alone, while supporting a 3x increase in VM density.

Figure 10 – Hard Drives with 20% Cache Performance, Utilization and Latency shows that as the number of VMs increases, the latency from the caching solution remains steady around 5ms, in contrast to the rapidly increasing latency seen with the HDDs in Figure 5 – Hard Drives and Multi-VM Transaction Performance and Latency.

To examine if we are getting the most of out the cache, we also tested with the cache size set to 30% of total database size, and compared it to the 20% cache measurements as shown below:

As Figure 11 – 20% vs. 30% Cache Size, Transaction Performance, CPU Utilization and Latency Comparison shows, there is virtually no difference in this test scenario from increasing the cache from 20% to 30% of database size.

Test Results: Fusion ioMemory and Multiple VMs

Replacing the hard drive storage with Fusion ioMemory made it possible to achieve much higher aggregate transaction rates, and add more virtual machines with much better performance.

Figure 12.Fusion ioMemory and Multi-VM Transaction Performance

In Figure 12 – Fusion ioMemory and Multi-VM Transaction Performance we reach an aggregate maximum transaction rate for the whole system around 85,000 trans/sec at around 20 VMs, which is 2.6x more work and 3x greater VM density.

The measured latency throughout the testing, with 1 VM through 25 VMs, was below 1ms (reported as 0ms) and is therefore not depicted.

As seen in Figure 13 – Fusion ioMemory and Multi-VM Transaction Performance and CPU Utilization, the CPU utilization is sustained longer, dropping below 90% only when the aggregate maximum transaction performance of 85,000 is reached.

Test Results

By gradually increasing the load on our three different test configurations from 1 to 25 VMs and capturing critical metrics, we obtained the results below.

Test Results – Hard Drives (Baseline)

In the baseline system we found that hard drive storage peaks at about 32,000 trans/sec or 12,000 IOPS with 7 virtual machines; performance declines with 8 VMs or more.

Test Results – FlashSoft Software with Fusion ioMemory

Using FlashSoft software for caching and setting the cache size to 20% of the total database size, with all data residing on the same hard drives used in the baseline configuration, we achieved transaction rates up to 67,000 trans/sec or 28,000 IOPS with 18 VMs. Note that at 21 VMs, the aggregate transaction rate dropped slightly to about 64,000, probably due to write-back activity to the hard disk drive storage.

This is a factor of 3x increase in VM density, 7 vs. 21, and 2x increase in the aggregate transaction rate, 67k vs. 32k.

Test Results – all-flash Fusion ioMemory

Using Fusion ioMemory storage we peak around 85,000 trans/sec or 53,000 IOPS with 20 VMs; this number remains consistent, stable up to the maximum of 25 VMs measured.

This is 3x+ increase in VM density, 7 vs. 25, and a 2.6x increase in the aggregate transaction rate, 85k vs. 32k.

Conclusions

We observed significant differences in the capabilities of the same test system using different storage configurations. The outcomes document significant impact in the cost to deploy and operate your workload consolidation environment.

CPU Utilization with Per-Core Licensing

Appendix B – Bill of Materials for Test System shows that about 80% of the total system cost is from per-core software licensing. Maximizing the work done by each core – i.e., maximizing CPU utilization – moves the performance bottleneck from the storage subsystem to the CPU, maximizing the value realized from each per-core license.

We saw that with hard drives, CPU utilization on the 28-core test system drops quickly after 7 workloads. With 9 workloads, CPU utilization was below 50%. Due to limitations of the storage configuration, half the cores could not be used to support additional workloads.

Using the FlashSoft software for caching with Fusion ioMemory, the CPU utilization was over 90% through 10 workloads, and above 75% through 19 workloads.

And using Fusion ioMemory, the CPU utilization was over 90% through 20 workloads, and at 89% at 25 workloads.

Clearly, the two SanDisk solutions maximize CPU utilization, and the value realized from the per-core software licenses that make up about 80% of the total system cost.

Fast and Easy to Deploy

The FlashSoft software caching solution requires no changes to the existing environment or applications, and can be deployed in about 60 minutes, to increase workload density on that host by about 3x.

The Fusion ioMemory all-flash solution can be deployed in about 30 minutes, and a one-time migration of the workloads’ VHDX files to Fusion ioMemory, to increase workload density on that host by about 3x.

Cost Effective

The savings from reducing the number of per-core software licenses by itself is many times the cost of either SanDisk solution.

The solution using FlashSoft software and Fusion ioMemory adds caching of hot data locally in the host server, requiring no changes to the existing storage environment, and delivers a 66% savings totaling $467,850, for a 79x ROI.

The all-flash solution using Fusion ioMemory locally in in the host server, to host data previously stored elsewhere, delivers a 64% savings totaling $453,219, for a 22x ROI.

These savings figures assume the purchase of new host servers; these same solutions will deliver a 3x increase in workload density using existing host servers. That delays or avoids the need to upgrade or replace existing storage solutions and host servers.

The FlashSoft software caching solution also extends the value of past investments in storage solutions, while those investments are depreciated.

Guidance for Choosing the Tested SanDisk Solutions

The two SanDisk solutions each increase workload density by about 3x, while meeting different needs.

Choose the FlashSoft software caching solution to:

Avoid impact on the existing environment, keeping data where it’s currently stored

Keep costs down to maximize ROI

Extend the value of past investments in storage solutions while those investments are depreciated

When the total database size is too large for the capacity of Fusion ioMemory supported in the host server

When the higher cost of Fusion ioMemory is not justified by higher performance

Choose the Fusion ioMemory solution to:

Maximize workload density on the host

Maximize CPU utilization to get full value from per-core software license investments

Maximize workload performance

When the total database size will fit on the Fusion ioMemory supported in the host server