I also tested using the same 8 disks as a striped disk in Windows. I removed the volume, vDisk, Storage Space, then provisioned a traditional RAID 0 striped disk in this Windows Server 2012 R2 VM. Results were slightly better:

This is still far off the expected 4k IOPS or 480 MB/s I should be seeing here.

I upgraded the VM to Standard A4 tier, and repeated the same tests:

Standard A4 VM can have a maximum of 16x 1TB persistent page blob disks. I used powershell to provision and attach 16 disks. Then create a storage space with 16 columns optimized for 1 MB stripe:

Then I benchmarked storage performance on drive e: using same exact IOMeter settings as above:

Results are proportionate to the Standard A3 VM test, but they still fall far short.

The IOmeter ‘Maximum I/O response Time’ is extremely high (26+ seconds). This has been a consistent finding in all Azure VM testing. This leads me to suspect that the disk requests are being throttled (possibly by the hypervisor).

To install it, extract the .rar file, and run install-SBTools.ps1 in the folder where you extracted the .rar file.

Test-SBDisk:

This is a function to test disk IO performance. It uses other functions in the SBTools module like Log and New-SBSeed.

This function tests disk IO performance by creating random files on the target disk and measuring IO performance. It leaves 2 files in the WorkFolder:

A log file that lists script progress, and

a CSV file that has a record for each testing cycle

Parameters:

This function accepts 5 parameters:

Parameter WorkFolder This is where the test and log files will be created. Must be on a local drive. UNC paths are not supported. The function will create the folder if it does not exist The function will fail if it’s run in a security context that cannot create files and folders in the WorkFolder Example: c:\support

Parameter MaxSpaceToUseOnDisk Maximum Space To Use On Disk (in Bytes) Example: 10GB, or 115MB or 1234567890

.Parameter Threads This is the maximum number of concurrent copy processes the script will spawn. Maximum is 16. Default is 1.

Parameter Cycles The function generates random files in a subfolder under the WorkFolder. When the total WorkSubFolder size reaches 90% of MaxSpaceToUseOnDisk, the script deletes all test files and starts over. This is a cycle. Each cycle stats are recorded in the CVS and log files Default value is 3.

Parameter SmallestFile Order of magnitude of the smallest file. The function uses the following 9 orders of magnitude: (10KB,100KB,1MB,10MB,100MB,1GB,10GB,100GB,1TB) referred to as 0..8 For example, SmallestFile value of 4 tells the script to use smallest file size of 100MB The script uses a variable: LargestFile, it’s selected to be one order of magnitude below MaxSpaceToUseOnDisk To see higher IOPS select a high SmallestFile value Default value is 4 (100MB). If the SmallestFile is too high, the script adjusts it to be one order of magnitude below LargestFile

This script tests disk IO performance by creating random files on the target disk and measuring IO performance. The script can be downloaded from the Microsoft Script Center Repository. It leaves 2 files in the WorkFolder:
A log file that lists script progress, and
a CSV file that has a record for each testing cycle.

Warning:

This script will delete all subfolders in the work-folder. Make sure to run it in a new folder.

This script generates file read/write activity to test storage performance. Set the $WorkFolder and $MaxSpaceToUseOnDisk variables on top to indicate the folder where test files will be created and maximum amount of disk space to be used on $WorkFolder during testing respectively.

The script can be run with the -verbose switch to show more information during processing, like:

.\busy2.ps1 -verbose

As the $WorkFolder exceed 90% of the capacity indicated by the $MaxSpaceToUseOnDisk variable, the scripts deletes all test files and starts a new cycle. the script can be stopped by clicking CTRL-C or setting the registry key HKLM:\Software\Microsoft\Busy to 0

The script includes a recursive function to generate seed files of (decimal) exponential sizes to be used for file copy activities.

The script generates 2 log files. The first one has details on the files created. The second one is a CSV file that has a record for every cycle the script goes through, including cycle duration, sum of generated file sizes, number of files generated on this cycle, average file size, disk throughput, IOPS,…

The error was that the script tried to create VM too fast. The interesting thing here is the magenta error message in this screen shot. It’s from an error handling code in the Create-VM.ps1 script. The script log file Create-VM_V-2012R2-LABk2_20140701_034214PM.txt showed:

So it appears that this “k” script working to create 8 VMs on drive “K” sent New-VM request at 3:43:32 (line 303 of CreateVM.ps1). That request took 11 seconds to complete as shown in event ID 13002 above. In the mean time, the script’s subsequent error checking lines 304, 305 detected that the VM has not bean created and aborted execution as designed.

This set of scripts ran tasks that attempted to create 64 VMs on 8 Gridstore vLUNs. It did so by running 8 scripts in parallel, each script attempted to create 8 VMs on a different vLUN. 55 VMs were created successfully. Script failures were tracked down to busy/overwhelmed physical host.

Similar to the last test, each of the 8 vLUNs is configured as an IOPS (2+1) LUN

Network bandwidth utilization was completely maxed out on the testing Hyper-V host (10 Gbps Ethernet) I will use NIC teams to provide 20 Gbps Ethernet bandwidth

Storage nodes’ CPU utilization was at around 60% in this test, which is not a bottleneck.

This test is essentially a disk copy test as well and not a Hyper-V virtual machine performance test.

The throughput values under Gbps and GB/Min. are meant to compare parallel versus serial execution testing and not as metrics for the storage array, because they include time to create and configure virtual machines in addition to time taken by disk operations.

Script ran in serial, creating 1 VM on 1 LUN, and after that’s done moving on to the next VM on the same LUN, then moving on to the next LUN.

Each LUN is configured as an IOPS (2+1) LUN. So, every write process is writing to 3 disks out of the array 24 total disks (whereas a read process reads from 2 disks in the LUN). Additional throughput is likely to be achieved by testing scenarios where we’re writing to all 8 LUNs simultaneously hitting 18 of the array disks at the same time.

Network bandwidth utilization is at about 68.2% capacity of the 10 Gbps Ethernet used in this test. for the next test (in parallel) I will use NIC teams to provide 20 Gbps Ethernet bandwidth

Storage nodes’ CPU utilization was at around 50% in this test, which is not a bottleneck.

This test is essentially a disk copy test and not a Hyper-V virtual machine performance test. Hyper-V testing with Gridstore will be shown in a future post.

Network utilization maxed out beyond the 10 Gbps single NIC used on both the compute and storage nodes. This suggests that the array is likely to deliver more IOPS if more network bandwidth is available. Next test will use 2 teamed NICs on the compute node as well as 3 storage nodes with teamed 10 Gbps NICs as well.

CPU is maxed on the storage nodesduring the test. Storage nodes have 4 cores. This suggests that CPU may be a bottleneck on storage nodes. It also leads me to believe that a) more processing power is needed on the storage nodes, and b) RDMA NICs are likely to enhance performance greatly. The Mellanox ConnectX-3 VPI dual port PCIe8 card may be just what the doctor ordered. In a perfect environment, I would have that coupled with the Mellanox Infiniband MSX6036F-1BRR 56Gbps Switch.

Disk IO performance on the storage nodes during the test showed about 240 MB/s data transfer, or about 60 MB/s per each of the disks in the node. This corresponds to the native IO performance of the SAS disks. This suggests minimal/negligible boost from the 550 GB PCIe flash card in the storage node.