StoneFly's iSCSI router links FC storage to LANs

Posted on November 27, 2007

—Via StoneFusion OS, a specialized operating system built on the Linux kernel, StoneFly's Storage Concentrator integrates the power of an iSCSI router with extensive storage management services. As a result, the appliance extends the benefits of an existing Fibre Channel SAN to a much broader base of clients—not the least of which are virtual machines (VMs) running VMware Virtual Infrastructure (VI).

Administrators can quickly install one or more Storage Concentrators using existing Ethernet and Fibre Channel infrastructure. Once installed, administrators can then leverage the concentrator's storage-provisioning engine to provide advanced storage management, business continuity, and disaster-recovery functions. In particular, StoneFusion provides storage virtualization, synchronous and asynchronous mirroring, snapshots, and active-active clustering of concentrators. Moreover, IT can leverage the appliance's support for heterogeneous storage resources to increase capacity utilization via heterogeneous storage pooling.

The StoneFusion management GUI provides a "Discover" button, which is used to launch a process that automatically discovers new storage resources. StoneFusion also automatically discovers any HTML-based management utilities. That provided us with the ability to bring up StorView, the storage management GUI for the nStor FC-FC array, directly from within StoneFusion.

Click here to enlarge image

Maximizing storage resource utilization is extremely important for CIOs, who are frequently under the gun to provide a more demonstrably responsive IT infrastructure. More important, the biggest driver of IT costs it is not the acquisition of resources, but rather the management of those resources. The general rule of thumb is that operating costs for managing storage on a per-gigabyte basis are 3x to 10x greater than the capital costs of storage acquisition. That's because provisioning and management tasks associated with storage resources are highly labor-intensive and often burdened by bureaucratic inefficiencies.

What makes virtualization critical for CIOs today is the ability of virtual devices to be isolated from the constraints of physical limitations. By separating function from physical implementation, administrators can manage resources as generic devices based on function. That means administrators can narrow their operations focus from a plethora of proprietary devices to a limited number of generic resource pools.

Using the StoneFusion management GUI, we provisioned logical volumes for benchmarking manually. In this way, we had complete control over the source of disk blocks from the resource pool of FC-based storage that had been created on the StoneFly i4000.

Click here to enlarge image

What's more, deriving the maximum benefits from server virtualization in a VI environment requires storage virtualization. The issues of availability and mobility of both a VM and its data play an important role in daily operational tasks, such as load balancing, to strategic plans such as a disaster recovery. In particular, SANs have long been the premier means of consolidating storage resources and streamlining management in large data centers. Nevertheless, storage virtualization for physical servers and commercial operating systems, such as Windows and Linux, is burdened with complexity because most commercial operating systems assume exclusive ownership of storage volumes.

Storage virtualization in a VI environment, however, is a much simpler proposition as the file system for VMware ESX, dubbed VMFS, eliminates the need for exclusive volume ownership by handling distributed file locking between systems. VMFS has a built-in distributed locking mechanism (DLM) that avoids the massive overhead that a DLM typically imposes: VMFS simply treats each disk volume as a single-file image in a way that is loosely analogous to an ISO-formatted CDROM. When a VM's OS mounts a disk, it opens a disk-image file; VMFS locks that file; and the VM's OS gains exclusive ownership of the disk volume with a plethora of files. That opens the door to using iSCSI to extend the benefits of physical and functional separation via a cost-effective lightweight SAN.

StoneFusion uses the unique ID of each iSCSI initiator on a client host as the primary means to control access to virtual volumes. With a QLogic iSCSI HBA installed on our ProLiant DL580 server, the VMware software initiator and the iSCSI HBA appeared as separately addressable hosts.

Click here to enlarge image

For cost-conscious IT decision-makers, StoneFly's Storage Concentrators incorporate a storage virtualization engine for provisioning and management in order to add another important advantage: the ability to cut operating costs. Administrators can use the StoneFusion management GUI to perform critical storage management tasks from virtualization to the creation of volume copies and snapshots and even the configuration of synchronous and asynchronous mirrors. As a result, a server administrator servicing an iSCSI client can directly handle the labor-intensive storage management tasks that would normally require coordination with a storage administrator.

When authorizing access to a volume, the Challenge-Handshake Authentication Protocol (CHAP) can be invoked in conjunction with the iSCSI initiator ID for added security. For our volume Win02, which contained a VM running Windows Server 2003, we granted full access to both of our ESX servers via their VMware iSCSI initiator. The VMFS DLM ensured that only one server at a time could open and start the Win02 VM image.

Click here to enlarge image

In our second scenario, we used our initial test results for the ProLiant ML 350 as a template for server consolidation. Using two four-way servers running ESX 3.0.1, openBench Labs tested iSCSI performance on an ESX host server in support of VM datastores that were hosting virtual work volumes. These tests were done in the context of replacing a ProLiant ML350 G3 server with a VM. In addition, we tested the volume copy and advanced image management functionality of StoneFusion in our VI environment. In these tests, we assessed StoneFusion as a means of enhancing the distribution of VM operating systems from templates and bolstering business continuity for disaster recovery.

Along with our StoneFly i4000 concentrator on the iSCSI side of our SAN fabric, we employed a Netgear level 3 managed Gigabit Ethernet switch and several QLogic 4050 iSCSI HBAs. By using the QLogic iSCSI HBAs, we were able to maximize throughput from the i4000 by eliminating all overhead associated with iSCSI packet processing.

Using StoneFusion's management GUI, openBench Labs was able to invoke a rich collection of storage management utilities, including a number of high-availability tools to create copies and maintain mirror images of volumes. Within a small VI environment, administrators can also use these tools in conjunction with the basic VI client software to provide simple VM template management capabilities that would normally require an additional server running the VMware Virtual Center.

Click here to enlarge image

On the Fibre Channel side of our fabric, we used a QLogic SANbox 9200 switch, an nStor 4540 disk array, and an IBM TotalStorage DS4100 array. We chose the DS4100 as the primary array for providing back-end storage for two reasons: its high capacity and its I/O caching capability. We configured the DS4100 with 3.2TB of capacity on SATA disks. From that pool, we assigned 1.6TB to the StoneFly i4000 in bulk via a single LUN.

For our tests, rapid response to excessively high numbers of I/O operations per second (IOPS) would trump capacity. That's because our oblLoad benchmark generates high numbers of IOPS to stress all components of a SAN fabric. With respect to our analysis, the DS4100 array provides an excellent balance of capacity with I/O responsiveness through two independent controllers, each of which features a 1GB cache and dual 2Gbps Fibre Channel ports.

For sequential I/O, the bundling of requests by Linux can be leveraged into a distinct advantage using the StoneFly i4000, which can stream data at wire speed. Using the oblDisk benchmark to read very large files sequentially, the only factor that limited throughput was the client's ability to accept data coming from the StoneFly i4000.

Click here to enlarge image

By performing all partitioning and management functions for virtual storage volumes on the iSCSI concentrator instead of the Fibre Channel array, openBench Labs was able to leverage key capabilities of StoneFusion to reduce operating costs by enabling system administrators to carry out tasks that normally require coordination with a storage administrator. In particular, we were able to consolidate storage from multiple Fibre Channel arrays into a pool that could be managed from the StoneFly i4000.

More importantly, we were able to configure logical volumes—dubbed resource targets in the iSCSI vernacular—and export them to client systems without any regard for the sources of the blocks within the pool. Nevertheless, to maintain consistency in benchmark performance—highly dependent on the disk drive characteristics, controller caching, and RAID configuration—openBench Labs created all volumes that would be used for performance benchmarking explicitly with disk blocks imported via the 1.6TB LUN from the DS4100 array.

IOPS throughput patterns for oblLoad using the QLogic HBA and the server’s embedded TOE were remarkably similar. However, we observed a very different pattern in IOPS performance on SLES. More importantly, because of the way the Linux kernel bundles I/O, SLES 10 IOPS performance is invariant with the size of I/O requests; large 64KB I/O requests provided similar IOPS performance as 8KB requests (see figure, directly below).

Click here to enlarge image

We began testing on a ProLiant ML350 G3 server running Windows Server 2003 and using both the QLogic iSCSI HBA and Microsoft's iSCSI software initiator. Like the initiator, QLogic's iSCSI HBAs support iSNS, which enables them to discover the StoneFly i4000 automatically. What's more, the QLogic iSCSI HBA offloads all iSCSI packet processing—a TOE only offloads the processing of the TCP packets that encapsulate the SCSI command packets—and thereby provides a distinct edge in processing IOPS. This is very significant for maximizing performance of the StoneFly i4000, which was able to sustain a load of 10,000 IOPS with 8KB data requests.

Click here to enlarge image

The IOPS throughput patterns for oblLoad using the QLogic HBA and the server's embedded TOE with the Microsoft initiator were remarkably similar; however, absolute performance measured in total IOPS was distinctly higher for the QLogic iSCSI HBA. This was especially true for small numbers of daemons, which coincides with the time when the host is most sensitive to changes in overhead. With more than 12 daemons, the difference in the number of IOPS completed varied by less than 2%.

A very different picture surfaced on Linux for transaction processing. IOPS performance for iSCSI on SLES10—even with the QLogic iSCSI HBA—trailed iSCSI performance on Windows Server 2003 by an order of magnitude. This is a function of the way Linux bundles I/O and has nothing to do with the StoneFly i4000. However, it is a condition that the i4000 can exploit. On the other hand, the StoneFusion OS is tuned for high data throughput. As a result, when we ran oblLoad with 64KB I/O requests—used in multi-dimensional business intelligence application scenarios—we measured the same level of IOPS while moving 8x more data.

In terms of IOPS performance, utilizing the QLogic iSCSI HBA on ESX provided the same level of performance as measured using a physical Windows server. This was not the case using the ESX software initiator. Nonetheless, a SLES 10 VM dramatically outperformed a physical server even when ESX used its software initiator (see figure, directly below).

Click here to enlarge image

That ability to deliver high throughput levels is particularly important in supporting high-end multimedia applications, especially streaming video. Both Linux and Windows client systems were able to stream large multi-gigabyte files sequentially at wire speed—1Gbps—through the i4000 concentrator.

Click here to enlarge image

In the final phase of testing of the StoneFly i4000, openBench Labs used two quad-processor servers to run a VMware Infrastructure 3 environment. This advanced third-generation platform virtualizes an entire IT infrastructure, including servers, storage, and networks. For our test scenario, we focused on the problem of consolidating four servers along the lines of our HP ProLiant ML350 G3 system on each of our four-way servers.

The primary means by which VMware ESX Server provides access to virtual storage volumes is by encapsulating a VM disk in a VMFS datastore. The VM disk is a single large VMFS file that is presented to the VM's OS as a SCSI disk drive, which contains a file system with many individual files. The VM OS issues I/O commands to what appears to be a local SCSI drive connected to a local SCSI controller. In practice, the block read/write requests are passed to the VMkernel where a physical device driver, such as the driver for the QLogic iSCSI HBA, forwards the read/write requests and directs them to the actual physical hardware device.

With a DLM, a datastore can contain multiple VM disk files that are accessed by multiple ESX Servers. That scheme can put I/O loads on a VMFS volume that are significantly higher than the loads on a disk volume in a single-host, single-OS environment. To meet those loads, VMFS has been tuned as a high-performance file system for storing large, monolithic virtual disk files. Tuning an array for a particular application becomes irrelevant when using a VM disk file.

The alternative to using VMFS is to use a raw LUN formatted with a native file system associated with the virtual machine. Using a raw device as though it were a VMFS-hosted file requires a VMFS-hosted pointer file to redirect I/O requests from a VMFS volume to the raw LUN. This scheme is dubbed Raw Device Mapping (RDM). What drives the use of an RDM scenario is the need to share data with external physical machines.

While openBench Labs ran functionality tests of RDM volumes, we chose to use unique VMware datastores to encapsulate single virtual volumes in our benchmark tests. Given that the default block size for VMFS is 1MB, we followed two fundamental rules of thumb in provisioning back-end storage for the StoneFly i4000:

Put as many spindles into the underlying Fibre Channel array as possible; and

Make the array's stripe size as large as possible.

In particular, we used seven-drive arrays with a stripe size of 256KB—the default for high-end Unix systems—in the IBM DS4100 disk array. With the two independent disk controllers and 1GB cache, we garnered a significant boost in our IOPS performance tests by exploiting read-ahead track caching.

Adding a mirror image to a volume is a relatively trivial task with the StoneFusion Management GUI. To create a clone of our VM-Win02 volume, we only needed to identify the volume and determine the number of mirrors to create. Once that was done, it was just as easy to detach the newly created mirror and promote the new image as VM-Win03 to create a new independent, stand-alone volume.

Click here to enlarge image

We began testing iSCSI performance on a VMware ESX Server with virtual machines running Windows Server 2003 SP2. With a 50GB datastore mounted via the QLogic HBA, the number of IOPS completed by oblLoad was virtually identical to the number completed on our base ProLiant ML350 server running Windows Server 2003 SP2. Without the iSCSI HBA, peak IOPS performance fell by about 40%.

The most extraordinary results, however, occurred when we ran SuSE Linux Enterprise Server (SLES) 10 SP1 within a VM. In this case, IOPS performance improved with both the QLogic iSCSI HBA and with the VMware iSCSI initiator, as compared to running a physical server. With a VM running SLES, however, IOPS performance was propelled well beyond what we had measured with a physical machine. While the basic pattern for IOPS throughput remained the same, the net performance result was a throughput level that was often on a scale showing an absolute increase in performance on the order of 200% to 250% for any given number of oblLoad disk daemons.

While the StoneFly i4000 provided exceptional performance, it was the added provisioning features of StoneFusion that made the biggest impact in managing a VI environment. Since the prime goal of server virtualization is to maximize resource utilization, multiple VMs will be running on a host server at any instant. To avoid the overhead of installing multiple instances of an OS, VMware supports the concept of creating an OS installation template and then cloning that template the next time that the OS is to be installed.

In a VI environment, the creation of templates is handled by the VMware Virtual Center software, which requires a separate system running Windows Server along with a commercial database, such as SQL Server or Oracle, to keep track of all disk images. Similar functionality can be leveraged using the i4000 Storage Concentrator through the StoneFusion image-management functions for volumes. While best practices call for maintaining offline template volumes for this task, we were able to use any volume at any time, provided that we were able take that volume offline.

To clone a volume image, we first needed to shut down all VMs running on that virtual volume and close any iSCSI sessions that were open for that volume with any ESX servers. Once this was done, we could begin the simple process of adding a mirror image to the volume, which is normally done to provide for high availability in either a disaster-recovery or backup scenario.

The creation of a mirror is a fast and efficient process under StoneFusion. We monitored the Fibre Channel switch port that was connected to the StoneFly i4000 during the process. Read-and-write data throughput remained fully synchronized during the process—each at a pace of 45MBps. At that rate, the process of generating an OS clone complete with any additional software applications was merely a matter of minutes.

We could then add the cloned VM to the pool of virtual machines on each ESX server. On powering on the new VM for the first time, the ESX server would recognize that this VM had an existing identifier and would request confirmation that it should either retain or create a new ID for this VM. Once that was completed, we were done with the process of creating a new VM.

By initially provisioning bulk storage to the StoneFly i4000 Storage Concentrator, as in the openBench Labs test scenario, ESX system administrators can address all of the iSCSI issues, including data security, which would normally require interaction with storage administrators. What's more, ESX system administrators can leverage the high-availability functions of the StoneFusion OS, such as snapshots and mirroring, to help create and maintain OS templates, as well as distribute data files as VMs are migrated within a VI environment. In this way, the StoneFly i4000 can help open the door to all of the advanced features of a VI environment, while reducing the costs of operations management.

Advertiser Disclosure:
Some of the products that appear on this site are from companies from which QuinStreet receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. QuinStreet does not include all companies or all types of products available in the marketplace.