According to SearchVirtualStorage – Atlantis Computing claims its first version of Atlantis ILIO could offload up to 70% of the writes and 90% percent of the reads of VDI IO workload hitting the storage array, and Version 2.0 improved the writes offload performance by an additional 20%.

I have innumerous times reasoned about the impact of write IO on storage arrays for VDI deployments and How to offload Write IOs from VDI deployments. If ILIO is able to offload up to 70% of the write IO, than storage requirements would either be smaller, or storage arrays would be able to sustain more virtual desktops using the same capacity/performance threshold. Below is a simple theoretical illustration of the above. It’s theoretical because there are a number of variables that could affect the numbers upwards and/or downwards.

If you follow my blog articles you probably know that I have delved into the technology and ran multiple tests to validate Atlantis claims. Yes, I have… I will share a little bit of what I have found. I will also discuss what the good use cases are for the technology.

I usually start my tests in my home lab and then when I have a defined testing methodology I expand the tests to another set of hardware when required.

The first validation test included VMware vSphere 5.0 and VMware View 5.0 with two Windows 7 Linked-Clone virtual desktops running in parallel with exact same configuration and resource limits.

For storage I used my iOmega IX4, which is not a hardware that you will want to run VMs in it. The IX-4 has only 4 SATA hard drives, and in my case they are only 5400 RPM.

The workload was generated by IOMeter with a configuration that mimics a VDI workload. IO Size = 8K, 100% Random IO, 20% Reads, 80% Writes, and the working set file was set to 1GB.

See what happens in the video below!

This video is available in HD and I recommend watching in HD

Ok, now that I’ve got you excited about the technology let’s understand how ILIO works.

ILIO sits on the storage IO path creating and creates a VMDK on LUNS or NFS exports presented by hypervizor. This VMDK file is then mounted by ILIO appliance and presented as NFS export back to vSphere hosts. (ILIO can be deployed per host or in a top of the rack architecture, but I will not get into deployment models here).

IO Caching is done in RAM, precluding IOs from hitting the array when possible.

ILIO understand NTFS file systems and is able to understand if a given IO is part of a Windows DLL, temporary file or Windows swap file, and will treat them differently. As an example, a DLL constantly accessed may be cached for further use, whereas IO that is part of Windows swap may not be put in RAM cache.

ILIO also has an IO coalescing feature that attempts to create large sequential IO blocks to allow efficient disk writes. Most intelligent storage arrays have some sort of IO coalescing mechanism. I’ll get back to that.

The last key feature is the Inline IO De-duplication that performs de-duplication in real-time before IO transactions reach the storage fabric. Because the de-duplication occurs before the IO hit the storage array the load on the array storage processors and spindles is reduced. At the same time, because no duplicate data is written to disk, post-process de-duplication is not required, and the associated IOPS cost is never incurred.

Moving to a more Robust Test Platform

To create valid tests it is important that a baseline is always maintained. In this case, during tests, the host used was a Cisco UCS B200 with 48GB RAM and two Quad-Core CPU’s at 2.526MHz. 200 linked-clone virtual desktops running Windows 7 with 1GB RAM were created.

The workload was generated using LoginVSI with the Default Heavy profile. LoginVSI is a workload simulator that targets more CPU and RAM utilization than storage IO. However, I still one of the best tools around to simulate a real VDI workload. Other tools such as View Planner and RAWC can also be used for this end.

ILIO was configured in a top of the rack deployment model with a single ILIO appliance serving hosts and all 200 virtual desktops.

The storage array used was an EMC VNX5500 with FAST Cache enabled. The array was dedicated to ILIO and no other workload was during the tests. A RG/LUN was configured with only 5 SAS 15K disks drives and this same pool of disks server the tests with ILIO and without ILIO.

Results

The same workload was generated for all tests with and without ILIO appliance. In fact I have ran the workload with ILIO multiple times to be able to tune ILIO appliance to efficiently work with the EMC VNX5500 array that used during the test. The default mode used by ILIO is good for slow SAN/NAS because the internal scheduler will sample IO service time and decide how much scatter/gather should be done. But the VNX5500 with FAST Cache is definitely not a slow array – so ILIO was good at eliminating peaks when latency creeps upward but during steady state when latency is good the scheduler thinks it best not to perform any IO coalescing. This behavior can be observed in the graph bellow (ILIO).

With help from Atlantis CTO Chetan (@chetan_) I was able to tune ILIO. The tuning helped to get the system to enable a feature called write scatter gather on the ILIO appliance – this allows ILIO to coalesce IO on the ILIO’s vScaler component which provides an intelligent NTFS caching and IO processing. The IO coalescing can be performed at a different levels in ILIO – in the ILIO IO scheduler primarily, but also on the vScaler.

When the scheduler was performing the coalescing, we found out that the VNX5500 with FAST Cache did not really have latency problems (Duh!) so the scheduler was being lazy and sending IOs without merging them or gathering them.

With the tuning complete I started to see aggressive IO coalescing happening and the number of IOs hitting the storage array drastically dropped from an average of 1785 IOPs to 242 IOPs (ILIO_NEWCONF).

By enabling scatter/gather on the ILIO vScaler – all writes are forced into coalescing and get re-arranged on the vScaler before being sent to SAN. This means that the ILIO tends to keep some uncommitted IOs in memory and therefore needs to be protected by having some form of non-volatile cache. Typically FusionIO card, NVRAM card, SLC SSD or a good MLC SSD is used. ILIO deployments with slower SANs or in environments with high storage latency would not require the use of non-volatile cache devices.

For non-persistent desktops maybe it’s not that important to be crash consistent should ILIO or the underlying hardware fail. Administrators should be able to recreate the desktops. In a persistent desktop scenario perhaps one ILIO appliance per host may provide better availability. High availability can also be provided through the use of vSphere Fault Tolerance, however this scenario would require also additional underlying physical hardware.

The graphs below demonstrate the IO size change after IO optimization techniques. Since the majority of IO time is spent doing physical positioning and seeking on the disk, large write IO can greatly reduce disk access time.

From all the results probably the below Disk Utilization % is the most impressive. The blue represents AVG and MAX % utilization without ILIO; and the green represents AVG and MAX % utilization with ILIO using the configuration tuned for the VNX5500 with FAST Cache. The graph below demonstrates the true power of ILIO optimizing, caching and coalescing IOs before they hit the storage array.

Important Note: Numbers and graphs in this article are a representation of the delta and volatile part of Linked Clone virtual desktops. Replica disks were placed in a dedicated datastore outside ILIO appliance.

Architecture

Each ILIO appliance requires 22GB RAM for approximately 65 concurrent VDI users. Additional users comes at a 150MB cost per user for a maximum of 200 users per appliance. A full appliance for 200 users require approximately 42GB RAM.

For a top of the rack architecture multiple virtual servers may be dedicated to ILIO (commonly entire physical servers are dedicated). If an ILIO appliance per host is the chosen architecture additional RAM per host is required, reducing the total consolidation ratio. Either way ILIO will use reasonable amounts of RAM resources – CPU not so much according to my tests.

I have performed lab tests to determine ILIO capabilities but I have never extensively used the solution in production environments and I am not able to talk about its reliability and stability. During my tests I have not incurred into any issues.

ILIO provides a synergetic complement to traditional storage arrays offloading IOs and helping organizations leverage existing investments while allowing for higher VDI consolidation. For newer storage arrays that support features like Automated Storage Tiering or make use of Solid State Drives the decision to implement a solution like ILIO to reduce the impact of IOs can be a little bit more complex and down to $$. Overall, adding ILIO to the architecture can provide a viable alternative to reduce the number of drives and offload backend IO’s. It’s great to see new products launched to benefit virtual infrastructures – Is it right for your architecture? The choice is up to you.

Below you can find the raw data for all the information published in this article.

Share this:

Gunnar

Great post Andre! I’ve done some testing in the field with the ILIO appliance and thought I’d share my impressions. I didn’t do nearly the benchmark you just did, I just had a stopwatch and tested how long it would take to boot a VM on the ILIO vs off the ILIO. My results where 30 second boot time for a VM off the ILIO and sub-5 seconds for a VM on the ILIO. I was floored by its performance.

The way I was using ILIO was even more interesting. The customer I was working with was trying to remove the need for a SAN. Basically they had low end storage in each host and wanted to run their VDI all off local storage. I’m not a huge fan on this type of setup due to the loss of HA, vMotion, and the ability to patch a server during they, but I can’t fault them for looking for cheaper methods of doing VDI. In any case, I setup the ILIO to run on the local storage and as I said I was very impressed with the results.

After getting pretty comfortable with the ILIO appliance I would say that I’d use it (recommend it) again, but I’d be more comfortable with a redundant top-of-rack design. If I did this I could put a cheap SAN in (you work at EMC so I’ll say an AX4) and have the ILIO appliances sit between the AX4 and my hosts. This would give me all the features I want and still give me the IO that an AX4 can’t provide.

Anyway, great post, just thought I’d give some feedback from a field deployment I did.

Mihai

I wonder, in an ideal world, shouldn’t all these optimizations ILIO does be done by the storage array controller (maybe by some kind of processing engines)? Doesn’t this demonstrate that EMC VNX is not a very intelligent array?

@Mihai
Yes, those should perhaps be array controller and/or hypervizor features. IMO in the long run, virtual storage appliance companies with interesting features such as Atlantis will endup being absorbed by companies like VMware or Citrix, or a storage vendor such as EMC, NetApp or HP.

If you look at VMware’s VAAI you will notice that offload of certain tasks to storage arrays is a trend. However, at same time they still implement hypervizor centrix features such as SIOC (Storage IO Control) because most storage arrays cannot do that today.

Great article Andre and great timing for it too as we’re currently implementing it into a large scale VDI deployment and your tips have given me loads of things to remember and think about. One question, you mentioned if we wanted to use FT for redundancy we “would require also additional underlying physical hardware”. What do you mean by this?

Gregg

Dan

Andre, this is a great piece. I as well, am a huge EMC guy working on early CLARiiON Denali’s to VNX. Lately been doing numerous testing with EFD, FAST, and Fast Cache. This is a great article as it is becoming apparent that VDI solutions are increase costs (especially on the storage side). And as the VM:HyperVisor ratio increases I can atest this cost will just increase as densities increase. The concept of a IO shim is really the best story to keep VDI within budget. After carefully reading this article I have 2 concerns with this solution that being the amount of RAM used 42 GB for 200 VMs can essentially put me at a new price point on the VMware side (unless I reduce the amount of VMs per ILIO appliance but it seems to defeat the purpose). The other is in order to do persistent desktops I may need an SSD device increasing costs more. I’ve been looking around for stats on the ILIO solution and appreciate the information. I’ve also been looking around at other solutions and came across Virsto….have you done any work with them? Thanks again

kepper

Gregg,
he means that if you go with an ilio on each host (local model as oppossed to a ‘top of rack design”, each ilio on the host requires its own reserved RAM (NOT shared with anyother vm..) if that ilio device fails or has issues, then all of the desktops on that host fail, so you could use fault tolerance of the ilio device on another host to keep that from happening, but you would need another host, another ilio (the FT one) and reserve all that RAM on another host.

so if you had an ilio device on a host serving 65 vms and you wanted to FT that ilio, you would lose 22GB of ram on host one and another 22GB of ram on host 2 for the FT ilio.

jaron

Great write up. You mention running the appliance with protection by FT. Is that a supported configuration? I am exploring the option using of atlantis in combination with full clones. Do you have any data on this or have you seen any that you could share? I’m interested in knowing what the space savings would be, how much it speeds up deployment and what the reduction of peak IO is. Thank you!

forbsy

Nice testing! I’ve been looking at ILIO latelly so glad to see you’ve done some testing. Is the datastore presented via ILIO only to be used for linked clones? How did you calculate the storage required to be able to do 200 desktops via ILIO? How large was the NFS datastore that was exported via the ILIO software? If I present a LUN from say a NetApp or even local server DAS, how large should that LUN be for ILIO? Should we just size the datastore as normal and ILIO will be limited by the ‘up to 65 desktops for a 22GB default ILIO appliance’?

@forbsy
The best way to answer to your questions is to forward you to the VMware View Online Calculator at http://myvirtualcloud.net/?page_id=1076. Create your configuration and make sure you select the Atlantis ILIO option. I also would recommend you to talk directly with Atlantis.

[…] me do the talking, I would recommend taking a look at this article for a deep-dive on Atlantis: http://myvirtualcloud.net/?p=2604. I’m currently evaluating Atlantis in my employers demo lab – so far so good. I’m also […]

[…] techniques, requiring less DRAM for the same amount of desktops. A good example here is the ILIO appliance from Atlantis Computing. Note that in my calculations I have not included the benefits of […]

[…] Atlantis ILIO is using a large amount of RAM in the hypervisor host to serve disk requests, enabling a VMware server to handle most disk requests from memory rather than from disk, thus creating a 10-fold improvement in performance. […]

[…] Atlantis ILIO is using a large amount of RAM in the hypervisor host to serve disk requests, enabling a VMware server to handle most disk requests from memory rather than from disk, thus creating a 10-fold improvement in performance. […]