Virtualization in the trenches with VMware, Part 2: Storage, networking, and blades

In the second installment of our virtualization series, Ars looks at the …

In part one of this series, we looked at selecting an enterprise virtualization platform, and at some of the benefits gained. Now we're going to look at some of the challenges involved in selecting hardware to run it on, and in the process we'll discuss storage, networking, and servers/blades.

The real challenge here is not so much using and managing the hardware that you already have, but picking new technologies to ensure that you get the appropriate price/performance ratio, the necessary support options, and the needed availability and recoverability. You must also ensure that your choices will be sustainable for at least two years, if not three or more. Finally, there's the very real consideration of power usage and heat dissipation, as the hosting industry has been moving toward charging based on power and heat instead of physical space usage for a number of years now. But first, a quick primer on storage.

When it comes to choosing the storage type, or platform, for a virtualization environment, there are five basic options, each with their own strengths, weaknesses, target usage scenarios, and price points. First, we'll cover the available types in order to introduce the major technologies, then we'll talk about several things to keep in mind with respect to availability, reliability, and recovery.

The first type of storage to look at is local storage, either in the form of higher performing SAS or SCSI disks, or cheaper SATA disks. To make this short, local storage should only be considered as a last resort, or for very specific deployments. Although the price/performance ratio of using local storage is excellent, employing it removes most of the benefits of virtualization, because all of the high-end features are automatically disabled when using local storage. A further detriment to using local storage is the introduction of an additional single point of failure per each node.

The second type of storage to consider is Networked File System, or NFS, which is a venerable networked storage system that started out in the UNIX world. It runs on top of your existing networking infrastructure, delivering solid performance for a reasonable price, depending on what hardware and software is being used to provide it. VMware ESX/vSphere is able to fully utilize NFS-based storage to enable its high-end functions, such as Dynamic Resource Scheduling [DRS], HA, and vMotion. Using multiple NFS mounts and enabling IP-based load-balancing can greatly increase throughput, as long as the underlying networking infrastructure can deliver. However, because NFS runs over a network and provides file-level access, as opposed to block-level access like all other kinds of storage, availability becomes an issue. This is especially important due to VMware only using NFS over TCP, as opposed to UDP. To be more specific, if the NFS end-point suffers any kind of an outage spanning more than about 30 seconds, some or all of the virtual machines residing on that mount point can crash and potentially suffer data loss. The same applies if your NFS appliance suffers an outage and fails over—the sessions need to be restarted, which can take a bit of time depending on how smooth the failover is.

Fibre Channel [FC] Storage Area Networks [SANs] are the de facto enterprise storage platforms, and are the third type of storage to consider. These types of storage can scale from one or two servers with a Host Bus Adapter [HBA] card in each, to hundreds or thousands of servers concurrently accessing the same SAN through redundant HBAs, each with redundant ports and redundant fabric paths. Think of a SAN much like a regular IP-based network, except that in this case it carries just storage data through a dedicated protocol. That is to say, you can segment and firewall a SAN in very similar ways to a regular network, and you can get multiple concurrently active traffic paths.

Fibre Channel offers dedicated high-throughput storage, with excellent performance, reliability, and redundancy. However, all of that comes at a steep price, once you consider that each server must have at least one, though typically two, HBA cards to connect it to the storage switches; each port is expensive at the source and destination, and that does not even take into account the overhead of running the fiber optic cables throughout the datacenter. Running the virtualization platform on a SAN has been the de-facto way of doing that for quite a number of years now, though that is quickly changing as newer technologies offer nearly all of the performance and features of Fibre Channel for a fraction of the price. That being said, if the highest possible performance is required, and price is no object, then Fibre Channel is the way to go, especially when you consider that you can purchase storage arrays with 256GB (yes, gigabytes) or more of cache.

iSCSI is a relatively new technology, the fourth on our list, offering most of the benefits and features of Fibre Channel, but at a reduced price point. Instead of using dedicated FC networking and cabling, iSCSI merely piggy-backs on top of your existing gigabit and 10-gigabit networking equipment. iSCSI works by making a point-to-point connection from the client (the Initiator in iSCSI parlance) to the storage array (the Target) over IP, which means that it can be switched and routed just like any other IP traffic. That being said, properly using iSCSI as the storage back-end for your virtualization project requires quality networking equipment and skilled network administrators who can properly configure and tune it to deliver the best bang for your buck. Rigorous quality of service [QoS] configurations are needed to deliver good performance; otherwise, heavy storage traffic can choke out data traffic, and vice versa. Another important feature of iSCSI is that you can connect either through a software initiator, suffering a small performance penalty for the overhead, or through dedicated Network Interface Cards [NICs] that offer iSCSI off-loading in order to do hardware acceleration for the protocol.

The final technology to consider is Fibre Channel over Ethernet [FCoE]. This technology is currently taking the storage world by storm because it promises performance potentially higher than regular Fibre Channel, but for a reduced per-port cost. It also boasts a number of other benefits, such as less cabling, less over-provisioning, and the ability to carry both Ethernet frames and Fibre Channel frames over the same cabling and network infrastructure. FCoE uses a special type of add-in card called a Converged Network Adapter [CNA], which transports both Fibre Channel storage traffic as well as regular Ethernet network traffic, at speeds of up to 10 Gigabits per second. Although specific FCoE-friendly network gear must be acquired, the costs of it are easily offset by the performance provided, as well as the built-in forklift upgrade to 10 Gigabit Ethernet, which ensures that it's an investment that will last for years to come.

Key considerations

The initial key considerations for selecting any storage system are that it be highly available, redundant, and fault tolerant. It is unacceptable to have a storage-level fault grind hundreds or thousands of virtual machines to a halt. High-end storage systems support active-active availability, ensuring both load distribution as well as instantaneous failover should one of the storage nodes suffer a catastrophic failover; this ensures that no downtime is experienced by the systems accessing the shared storage. Having some kind of high availability in your storage platform is a must for any kind of production use.

The second key consideration is whether the storage platform supports a feature called deduplication. Deduplication is a fairly recent development that's rapidly growing in popularity, especially in virtualized environments, where it brings massive cost and storage usage savings in most typical scenarios. It works by analyzing the disk blocks making up the stored data, and by employing advanced algorithms to efficiently remove duplicated content at the block level, leading to space savings of up to 90% in some cases. The reason this works so well is because the majority of the guest OS data in a VM will be identical to the data in every other VM of that type. Not having to store the same file hundreds of times for every VM leads to huge space savings, and it also yields a performance boost due to array-level caching.