Posts Tagged ‘VSA’

Our open storage partner, Nexenta Systems Inc., hit a milestone this month by releasing NexentaStor 4.0.1 for general availability. This release is significant mainly because it is the first commercial release of NexentaStor based on the Open Source Illumos kernel and not Oracle’s OpenSolaris (now closed source). With this move, NexentaStor’s adhering to the company’s promise of “open source technology” that enables hardware independence and targeted flexibility.

Some highlights in 4.0.1:

Faster Install times

Better HA Cluster failover times and “easier” cluster manageability

Support for large memory host configurations – up to 512GB of DRAM per head/controller

Note, the 18TB Community Edition EULA is still hampered by the “non-commercial” language, restricting it’s use to home, education and academic (ie. training, testing, lab, etc.) targets. However, the “total amount of Storage Space” license for Community is a deviation from the Enterprise licensing (typical “raw” storage entitlement)

2.2 If You have acquired a Community Edition license, the total amount of Storage Space is limited as specified on the Site and is subject to change without notice. The Community Edition may ONLY be used for educational, academic and other non-commercial purposes expressly excluding any commercial usage. The Trial Edition licenses may ONLY be used for the sole purposes of evaluating the suitability of the Product for licensing of the Enterprise Edition for a fee. If You have obtained the Product under discounted educational pricing, You are only permitted to use the Product for educational and academic purposes only and such license expressly excludes any commercial purposes.

– NexentaStor EULA, Version 4.0; Last updated: March 18, 2014

For those who operate under the Community license, this means your total physical storage is UNLIMITED, provided your space “IN USE” falls short of 18TB (18,432 GB) at all times. Where this is important is in constructing useful arrays with “currently available” disks (SATA, SAS, etc.) Let’s say you needed 16TB of AVAILABLE space using “modern” 3TB disks. The fact that your spinning disks are individually larger than 600GB indicates that array rebuild times might run afoul of failure PRIOR to the completion of the rebuild (encountering data loss) and mirror or raidz2/raidz3 would be your best bet for array configuration.

SOLORI Note: Richard Elling made this concept exceedingly clear back in 2010, and his “ZFS data protection comparison” of 2, 3 and 4-way mirrors to raidz, raidz2 and raidz3 is still a great reference on the topic.

By “raw” licensing standards, the 3-way mirror would require a 76TB license while the raidz2 volume would require a 51TB license – a difference of 25TB in licensing (around $5,300 retail). However, under the Community License, the “cost” is exactly the same, allowing for a considerable amount of flexibility in array loadout and configuration.

Why do I need 54TiB in disk to make 16TB of “AVAILABLE” storage in Raidz2?

The RAID grouping we’ve chosen is 6-disk raidz2 – that’s akin to 4 data and 2 parity disks in RAID6 (without the fixed stripe requirement or the “write hole penalty.”) This means, on average, one third of the space consumed on-disk will be in the form of parity information. Therefore, right of the top, we’re losing 33% of the disk capacity. Likewise, disk manufacturers make TiB not TB disks, so we lose 7% of “capacity” in the conversion from TiB to TB. Additionally, we like to have a healthy amount of space reserved for new block allocation and recommend 30% unused space as a target. All combined, a 6-disk raidz array is, at best, 43% efficient in terms of capacity (by contrast, 3-way mirror is only 22% space efficient). For an array based on 3TiB disks, we therefore get only 1.3TB of usable storage – per disk – with 6-disk raidz (by contrast, 10-disk raidz nets only 160GB additional “usable” space per disk.)

SOLORI’s Take: If you’re running 3.x in production, 4.0.1 is not suitable for in-place upgrades (yet) so testing and waiting for the “non-disruptive” maintenance release is your best option. For new installations – especially inside a VM or hypervisor environment as a Virtual Storage Appliance (VSA) – version 4.0.1 presents a better option over it’s 3.x siblings. If you’re familiar with 3.x, there’s not much new on the NMV side outside better tunables and snappier response.

In my last post, I mentioned a much complained about “idle” CPU utilization quirk with NexentaStor when running as a virtual machine. After reading many supposed remedies on forum postings (some reference in the last blog, none worked) I went pit-bull on the problem… and got lucky.

For me, the problem cropped-up while running storage benchmarks on some virtual storage appliances for a client. These VSA’s are bound to a dedicated LSI 9211-8i SAS/6G controller using VMware’s PCI pass-through (Host Configuration, Hardware, Advanced Settings). The VSA uses the LSI controller to access SAS/6G disks and SSDs in a connected JBOD – this approach allows for many permutations on storage HA and avoids physical RDMs and VMDKs. Using a JBOD allows for attachments to PCIe-equipped blades, dense rack servers, etc. and has little impact on VM CPU utilization (in theory).

So I was very surprised to find idle CPU utilization (according to ESXi’s performance charting) hovering around 50% from a fresh installation. This runs contrary to my experience with NexentaStor, but I’ve seen reports of such issues on the forums and even on my own blog. I’ve never been able to reproduce more than a 15-20% per vCPU bias between what’s reported in the VM and what ESXi/vCenter sees. I’ve always attributed this difference to vSMP and virtual hardware (disk activity) which is not seen by the OS but is handled by the VMM.

CPU record of idle and IOzone testing of SAS-attached VSA

During the testing phase, I’m primarily looking at the disk throughput, but I notice a persistent CPU utilization of 50% – even when idle. Regardless, the 4 vCPU VSA appears to perform well (about 725MB/sec 2-process throughput on initial write) despite the CPU deficit (3 vCPU test pictured above, about 600MB/sec write). However, after writing my last blog entry, the 50% CPU leach just kept bothering me.

After wasting several hours researching and tweaking with very little (positive) effect, a client e-mail prompted a NMV walk through with resulted in an unexpected consequence: the act of powering-off the VSA from web GUI (NMV) resulted is significantly reduced idle CPU utilization.

Getting lucky: noticing a trend after using NMV to reboot for a client walk-through of the GUI.

Working with the 3 vCPU VSA over the previous several hours, I had consistently used the NMC (CLI) to reboot and power-off the VM. The fact of simply using the NMV to shutdown the VSA couldn’t have anything to do with idle CPU consumption, could it? Remembering that these were fresh installations I wondered if this was specific to a fresh installation or could it show up in an upgrade. According to the forums, this only hampered VMs, not hardware.

I grabbed a NexentaStor 3.1.0 VM out of the library (one that had been progressively upgraded from 3.0.1) and set about the upgrade process. The result was unexpected: no difference in idle CPU from the previous version; this problem was NOT specific to 3.1.2, but specific to the installation/setup process itself (at least that was the prevailing hypothesis.)

Regression into my legacy VSA library, upgrading from 3.1.1 to 3.1.2 to check if the problem follows the NexentaStor version.

If anything, the upgraded VSA exhibited slightly less idle CPU utilization than its previous version. Noteworthy, however, was the extremely high CPU utilization as the VSA sat waiting for a yes/no response (NMC/CLI) to the “would you like to reboot now” question at the end of the upgrade process (see chart above). Once “no” was selected, CPU dropped immediately to normal levels.

Now it seemed apparent that perhaps an vestige of the web-based setup process (completed by a series of “wizard” pages) must be lingering around (much like the yes/no CPU glutton.) Fortunately, I had another freshly installed VSA to test with – exactly configured and processed as the first one. I fired-up the NMV and shutdown the VSA…

Confirming the impact of the "fix" on a second fresh installed NexentaStor VSA

After powering-on the VM from the vSphere Client it was obvious. This VSA had been running idle for some time, so it’s idle performance baseline – established prior across several reboots from CLI – was well recorded by the ESXi host (see above.) The resulting drop in idle CPU was nothing short of astounding: the 3 vCPU configuration has dropped from a 50% average utilization to 23% idle utilization. Naturally, these findings (still anecdotal) have been forwarded on to engineers at Nexenta. Unfortunately, now I have to go back and re-run my storage benchmarks; hopefully clearing the underlying bug has reduced the needed vCPU count…

Now VMware Tools has been installed and you’re ready to add more virtual disks and build ZFS storage pools. If you get a warning about HGFS not loading properly at boot time:

HGFS module mismatch warning.

it is not usually a big deal, but the VMware Host-Guest File System (HGFS) has been known to cause issues in some installations. SInce the NexentaStor appliance is not a general purpose operating system, you should customize the install to not use HGFS at all. To disable it, perform the following:

Edit “/kernel/drv/vmhgfs.conf”

Change: ﻿ name=”vmhgfs” parent=”pseudo” instance=0;

To: #name=”vmhgfs” parent=”pseudo” instance=0;

Re-boot the VSA

Upon reboot, there will be no complaint about the offending HGFS module. Remember that, after updating VMware Tools at a future date, the HGFS configuration file will need to be adjusted again. By the way, this process works just as well on the NexentaStor Commercial edition, however you might want to check with technical support prior to making such changes to a licensed/supported deployment.

In this segment, Part 5, we will create a VMware Virtual Center (vCenter) virtual machine and place the ESX and ESXi machines under management. Using this vCenter instance, we will complete the configuration of ESX and ESXi using some of the new features available in vCenter.

Part 5, Managing our ESX Cluster-in-a-Box

With our VSA and ESX servers purring along in the virtual lab, the only thing stopping us from moving forward with vMotion is the absence of a working vCenter to control the process. Once we have vCenter installed, we have 60-days to evaluate and test vSphere before the trial license expires.

Prepping for vCenter Server for vSphere

We are going to install Microsoft Windows Server 2003 STD for the vCenter Server operating system. We chose Server 2003 STD since we have limited CPU and memory resources to commit to the management of the lab and because our vCenter has no need of 64-bit resources in this use case.

Since one of our goals is to have a fully functional vMotion lab with reasonable performance, we want to create a vCenter virtual machine with at least the minimum requirements satisfied. In our 24GB lab server, we have committed 20GB to ESX, ESXi and the VSA (8GB, 8GB and 4GB, respectively). Our base ESXi instance consumes 2GB, leaving only 2GB for vCenter – or does it?

Memory Use in ESXi

VMware ESX (and ESXi) does a good job of conserving resources by limiting commitments for memory and CPU. This is not unlike any virtual memory capable system that puts a premium on “real” memory by moving less frequently used pages to disk. With a lot of idle virtual machines, this ability alone can create significant over-subscription possibilities for VMware; this is why it could be possible to run 32GB worth of VM’s to run on a 16-24GB host.

Do we really want this memory paging to take place? The answer – for the consolidation use cases – is usually “yes.” This is because consolidation is born out of the need to aggregate underutilized systems in a more resource efficient way. Put another way, administrators tend to provision systems based on worst case versus average use, leaving 70-80% of those resources idle in off-peak times. Under ESX’s control those underutilized resources can be re-tasked to another VM without impacting the performance of either one.

On the other hand, our ESX and VSA virtual machines are not the typical use case. We intend to fully utilized their resources and let them determine how to share them in turn. Imagine a good number of virtual machines running on our virtualized ESX hosts: will they perform well with the added hardship of memory paging? Also, when begin to use vMotion those CPU and memory resources will appear on BOTH virtualized ESX servers at the same time.

It is pretty clear that if all of our lab storage is committed to the VSA, we do not want to page its memory. Remember that any additional memory not in use by the SAN OS in our VSA is employed as ARC cache for ZFS to increase read performance. Paging memory that is assumed to be “high performance” by NexentaStor would result in poor storage throughput. The key to “recursive computing” is knowing how to anticipate resource bottlenecks and deploy around them.

This brings the question: how much memory is left after reserving 4GB for the VSA? To figure that out, let’s look at what NexentaStor uses at idle with 4GB provisioned:

NexentaStor's RAM footprint with 4GB provisioned, at idle.

As you can see, we have specified a 4GB reservation which appears as “4233 MB” of Host Memory consumed (4096MB+137MB). Looking at the “Active” memory we see that – at idle – the NexentaStor is using about 2GB of host RAM for OS and to support the couple of file systems mounted on the host ESXi server (recursively).

Additionally, we need to remember that each VM has a memory overhead to consider that increases with the vCPU count. For the four vCPU ESX/ESXi servers, the overhead is about 220MB each; the NexentaStor VSA consumes an additional 140MB with its two vCPU’s. Totaling-up the memory plus overhead identifies a commitment of at least 21,828MB of memory to run the VSA and both ESX guests – that leaves a little under 1.5GB for vCenter if we used a 100% reservation model.

Memory Over Commitment

The same concerns about memory hold true for our ESX and ESXi hosts – albeit in a less obvious way. We obviously want to “reserve” memory for required by the VMM – about 2.8GB and 2GB for ESX and ESXi respectively. Additionally, we want to avoid over subscription of memory on the host ESXi instance – if at all possible – since it will already be working running our virtual ESX and ESXi machines.

In this segment, Part 4, we will create an ESXi instance on NFS along with an ESX instance on iSCSI, and, using writable snapshots, turn both of these installations into quick-deploy templates. We’ll then mount our large iSCSI target (created in Part 3) and NFS-based ISO images to all ESX/ESXi hosts (physical and virtual), and get ready to install our vCenter virtual machine.

Part 4, Making an ESX Cluster-in-a-Box

With a lot of things behind us in Parts 1 through 3, we are going to pick-up the pace a bit. Although ZFS snapshots are immediately available in a hidden “.zfs” folder for each snapshotted file system, we are going to use cloning and mount the cloned file systems instead.

Cloning allows us to re-use a file system as a template for a copy-on-write variant of the source. By using the clone instead of the original, we can conserve storage because only the differences between the two file systems (the clone and the source) are stored to disk. This process allows us to save time as well, leveraging “clean installations” as starting points (templates) along with their associate storage (much like VMware’s linked-clone technology for VDI.) While VMware’s “template” capability allows us save time by using a VM as a “starting point” it does so by copying storage, not cloning it, and therefore conserves no storage.

Using clones in NexentaStor to conserve storage and aid rapid deployment and testing. Only the differences between the source and the clone require additional storage on the NexentaStor appliance.

While the ESX and ESXi use cases might not seem the “perfect candidates” for cloning in a “production” environment, in the lab it allows for an abundance of possibilities in regression and isolation testing. In production you might find that NFS and iSCSI boot capabilities could make cloned hosts just as effective for deployment and backup as they are in the lab (but that’s another blog).

Here’s the process we will continue with for this part in the lab series:

Create NFS folder in NexentaStor for the ESXi template and share via NFS;

Modify the NFS folder properties in NexentaStor to:

limit access to the hardware ESXi host only;

grant the hardware ESXi host “root” access;

Create a folder in NexentaStor for the ESX template and create a Zvol;

From VI Client’s “Add Storage…” function, we’ll add the new NFS and iSCSI volumes to the Datastore;

Create ESX and ESXi clean installations in these “template” volumes as a cloning source;

Unmount the “template” volumes using the VI Client and unshare them in NexentaStore;

Clone the “template” Zvol and NFS file systems using NexentaStore;

Mount the clones with VI Client and complete the ESX and ESXi installations;

Mount the main Zvol and ISO storage to ESX and ESXi as primary shared storage;

There are many features in vSphere worth exploring but to do so requires committing time, effort, testing, training and hardware resources. In this feature, we’ll investigate a way – using your existing VMware facilities – to reduce the time, effort and hardware needed to test and train-up on vSphere’s ESXi, ESX and vCenter components. We’ll start with a single hardware server running VMware ESXi free as the “lab mule” and install everything we need on top of that system.

Part 1, Getting Started

To get started, here are the major hardware and software items you will need to follow along:

Recommended Lab Hardware Components

One 2P, 6-core AMD “Istanbul” Opteron system

Two 500-1,500GB Hard Drives

24GB DDR2/800 Memory

Four 1Gbps Ethernet Ports (4×1, 2×2 or 1×4)

One 4GB SanDisk “Cruiser” USB Flash Drive

Either of the following:

One CD-ROM with VMware-VMvisor-Installer-4.0.0-164009.x86_64.iso burned to it

An IP/KVM management card to export ISO images to the lab system from the network

For the hardware items to work, you’ll need to check your system components against the VMware HCL and community supported hardware lists. For best results, always disable (in BIOS) or physically remove all unsupported or unused hardware- this includes communication ports, USB, software RAID, etc. Doing so will reduce potential hardware conflicts from unsupported devices.

The Lab Setup

We’re first going to install VMware ESXi 4.0 on the “test mule” and configure the local storage for maximum use. Next, we’ll create three (3) machines two create our “virtual testing lab” – deploying ESX, ESXi and NexentaStor running directly on top of our ESXi “test mule.” All subsequent tests VMs will be running in either of the virtualized ESX platforms from shared storage provided by the NexentaStor VSA.

Today we’re looking at the Xtravirt Virtual SAN Appliance (VSA) solution for use with VMware ESX Server 3. It is designed to be a simple to deploy, redundant (DRBD synchronization), high-availability iSCSI SAN between two ESX servers. We are installing it on two ESXi servers, each with local storage, running the latest patch update (3.5.0 build 143129).

Initial Installation

XVS requires a virtual machine import: either using Converter or manual process. We follow the manual process. We used the command line to convert the imported disk into a ESXi-compliant format:

After conversion, you have a 2GB virtual machine (times two) ready for configuration. We removed the legacy ethernet and hard disk that came with the inventory import. Then add the “existing” disk and new Ethernet (flex) controller.

We then added a 120GB virtual disk to each node using the local storage controllers: LSI 1068SAS (RAID1) for node 1 and NVidia MCP55Pro (RAID1) for node 2. Node 1 and 2 are using the same 250GB Seagate ES.2 (RAID edition) drives. Read the rest of this entry ?

In Medio Stat Veritas

SOLORI's Take and Quick Take posts express my personal opinion unless explicitly attributed to other sources. Where possible, supporting facts are presented to properly frame and ground these opinions, however they are presented "AS-IS" without regard to warranty or promise: expressed or implied.

Comments are open to all registered users and may be edited for decorum. Spam is deleted with prejudice.