White Papers

Performance Test: FlashSoft 3.7 for VMware vSphere 5.5

This paper describes the application performance and VM density gains achieved in a virtualized computing environment using FlashSoft software as write-through cache with synthetic workload (fio.exe).

Introduction

The objective of this paper is to present the performance and VM density gains that can be achieved in a virtualized computing environment with the application of a host-based solid-state storage cache enabled by the FlashSoft® software from SanDisk®. VMware vSphere® 5.5 was installed on a host server configured with multiple virtual machines (VMs) and FlashSoft software was used to provide a host-based write-through cache to accelerate workloads running on the VMs. Using a synthetic workload generated and measured by the benchmark program fio.exe, application performance and VM density of the accelerated VMs were compared to baseline configurations comprising all-HDD storage without caching.

RAID Controller: The following settings were used for all storage for the tests:

Strip Size: 64KB

Access Policy: Read Write

Disk Cache Policy: Default

Read Policy: Always Read Ahead

Write Policy: Always Write Back (although the RAID controller contains some onboard cache, its small size relative to storage and the external cache did not adversely affect test performance)

Patrol Read Mode: Auto

OS HDD:
Two 600GB SAS 10.6K RPM HDDs were configured as a 558GB RAID1 disk and contained the ESXi hypervisor.

Target HDD:
Eight 600GB SAS 10.6K RPM HDDs were configured as a 3.9TB RAID5 disk and provided the storage backend for all VMs used in the test.

Cache SSD:
A single 1.2TB PCIe Flash memory device was enabled for caching by FlashSoft software.

Benchmark Tests

Benchmark tests were conducted multiple times to measure and compare performance of the non-accelerated server running over the all-HDD storage backend (the “baseline configuration”) and the same server running over the same HDD backend, but accelerated using FlashSoft software to drive the flash memory as a server-tier write-through cache (the “accelerated configuration”).

The following testing procedures were used:

Install and configure the ESXi hypervisor.

Configure the storage to be tested.

Install and configure flash memory (SSD) and FlashSoft software.

Create the required number of initial VMs and install CentOS 6.5 as the guest operating system using the settings specified for the test to be performed.

Install the fio.exe benchmark software on the VM(s).

Conduct benchmark test and record results:

Ensure the benchmark tests are run concurrently on all VMs being tested.

Note the increase of application performance with caching enabled compared to baseline.

For VM Density testing – Clone the VM (created in step 4) and repeat benchmark tests described in step 6.

For VM Density testing – Continue increasing the number of tested VMs until maximum VM density with caching enabled can be determined. (Latency of accelerated VMs equals or exceeds latency of baseline.)

Considerations

In order to consistently measure the performance of cached configurations and reflect the operation of a cache that has been warmed through normal use, the cache was completely flushed after each individual test and “pre-conditioned” immediately before conducting the next benchmark test using the same warmup workload.

When testing multiple VMs, performance must be measured on all VMs concurrently. Ensure the benchmark test in each VM will run long enough for all VMs to be launched and benchmark scripts to run until completion.

Workload Configuration and Testing Methodology

The workload was configured and tested in two ways. The first test set was a basic performance test to measure the increase in application performance provided by FlashSoft software. This test was limited to two VMs concurrently running 100GB workloads with a 70%/30% read/write ratio, 100% random data distribution and 4KB aligned data blocks. The 1.2TB size of the FlashSoft cache on the host server was large enough to fully contain the workloads tested in the VMs. This sped the testing process and simplified analysis of test results – the benchmark test was run in each VM until the cache hit ratio approached 100%; it could then be assumed the cache was adequately warmed and operating at its maximum potential. The measured IOPS values were summed for both VMs while the latency values were averaged across both VMs and weighted to match the 70/30 read/write ratio of the benchmark test.

The second test set was designed to measure VM density improvement. A lighter weight VM configuration was used for the density test and the workload was adjusted to limit the total IOPS processed by each VM to ensure uniformity amongst all VMs used in the test and to prevent any factors other than storage IO bandwidth from becoming a limiting factor (e.g. system memory, CPU utilization, network utilization, etc.) The test was started with a single VM; IOPS and latency were measured and graphed for the baseline (non-accelerated) and accelerated configurations. The VM was cloned and the test was run again and data recorded. This process was repeated, each run adding an additional VM to the host until the average weighted latency of the accelerated VMs matched that of the single VM running in the baseline configuration. This indicates the increased number of VMs (density) that can be supported by the accelerated system while providing the same level of performance of the baseline system.

Benchmark Preconditioning

Warmup for All Tests: Prior to each benchmark test the cache was warmed up using the following: fio.exe script:

The tests were conducted with small data sets using fio.exe a synthetic benchmark testing tool. Although the tests were constructed to simulate the conditions typically encountered in real-world computing and to generate data that reveal application performance and VM density, the tests can only be considered a demonstration of the capability of FlashSoft software – the data should not be interpreted as the performance impact of FlashSoft software for all workload types and storage environments. The actual performance of any caching solution is highly dependent upon the workload and the computing environment in which it is used.

The application performance test was limited to two virtual machines that were allowed to run the benchmark test unencumbered. Read and write IOPS were aggregated for both virtual machines and used for direct comparison between baseline and accelerated configurations. Latency was averaged and weighted to account for the fact that over the course of the test, individual IOs had varying amounts of latency and 70% of the IO activity was read operations and 30% of the IO was write operations.

The IOPS comparison shows more than 2.4 times IOs were processed during the test with write-through caching enabled. The read and write latency values indicated in Table 1 show greater than 37 times decrease in read latency but a 3.4 times increase in write latency with caching enabled. This condition demonstrates typical behavior of a write-through cache because IOs are written to both the SSD and HDD backend before the write request is acknowledged to the application. Furthermore, a greater number of IOs were actually processed by the accelerated VMs during the test compared to the baseline. These two factors result in slightly increased write latency values with write-through caching enabled. The overall result of the test; however, in which 70% of all IOs were read requests and 30% were write requests, demonstrates a net decrease of total latency by 2.9 times.

The VM density test was handled differently from the performance test. The VMs were constructed to consume less system resources and run against smaller individual workloads. This was done to prevent the host server from becoming overloaded and to ensure uniformity amongst all VMs as the tests were run. Baseline and accelerated weighted latencies were measured and compared. The objective of the test was to determine how many more VMs could be operated with caching enabled compared to a non-accelerated baseline, within the same latency Service Level Agreement (SLA). The test showed a single VM without caching had a latency of 1.68 milliseconds, defining the single-VM SLA at 2 to 3 milliseconds for the baseline configuration, with latency steadily and greatly increasing as additional VMs were added. Furthermore, in the baseline configuration, when more than two VMs were provisioned, IOPS noticeably dropped below the upper threshold, clearly indicating an IO bottleneck. With FlashSoft caching enabled, latency essentially remained unchanged for up to five VMs and then gradually rose until eight VMs were provisioned, at which point the weighted average latency was still only 1.65 milliseconds – the same latency as a single non-accelerated VM. The IOPS of the accelerated VMs remained at the upper threshold for the first six VMs and only slightly dipped by addition of the eighth VM, indicating the alleviation of the IO bottleneck observed in the baseline configuration. Thus, the use of FlashSoft software as a write-through cache allowed an increase of eight times the VM density compared to an all-HDD baseline.