What do you get when you combine four independent servers, lots of memory, standard SATA disks and SSD, 10Gb networking, and custom software in a single box? In this instance, the answer would be a Nutanix NX-3000. Pigeonholing the Nutanix product into a traditional category is another riddle altogether. While the company refers to each unit it sells as an "appliance," it really is a clustered combination of four individual servers and direct-attached storage that brings shared storage right into the box, eliminating the need for a back-end SAN or NAS.

I was recently given the opportunity to go hands on with a Nutanix NX-3000, the four nodes of which were running version 3.5.1 of the Nutanix operating system. It's important to point out that the Nutanix platform handles clustering and file replication independent of any hosted virtualization system. Thus, a Nutanix cluster will automatically handle node, disk, and network failures while providing I/O at the speed of local disk -- and using local SSD to accelerate access to the most frequently used data. Nutanix systems support the VMware vSphere and Microsoft Hyper-V hypervisors, as well as KVM for Linux-based workloads.

Nutanix was founded by experienced data center architects and engineers from the likes of Google, Facebook, and Yahoo. That background brings with it a keen sense of what makes a good distributed system and what software pieces are necessary to build a scalable, high-performance product. A heavy dose of innovation and ingenuity shows up in a sophisticated set of distributed cluster management services, which eliminate any single point of failure, and in features like disk block fingerprinting, which leverages a special Intel instruction set (for computing an SHA-1 hash) to perform data deduplication and to ensure data integrity and redundancy.

A Nutanix cluster starts at one appliance (technically three nodes, allowing for the failure of one node) and scales out to any number of nodes. The NDFS (Nutanix Distributed File System) provides a single store for all of your VMs, handling all disk and I/O load balancing and eliminating the need to use virtualization platform features like VMware's Storage DRS. Otherwise, you manage your VMs no differently than you would on any other infrastructure, using VMware's or Microsoft's native management tools.

Nutanix architectureThe hardware behind the NX-3000 comes from SuperMicro. Apart from the fact that it squeezes four dual-processor server blades inside one 2U box, it isn't anything special. All of the magic is in the software. Nutanix uses a combination of open source software, such as Apache Cassandra and ZooKeeper, plus a bevy of in-house developed tools. Nutanix built cluster configuration management services on ZooKeeper and heavily modified Cassandra for use as the primary object store for the cluster.

The combination of hardware nodes plus special software makes up the Nutanix Distributed File System. At the heart of each cluster node is the Nutanix Controller Virtual Machine. This hypervisor-specific virtual machine -- Nutanix offers different versions tuned for vSphere, Hyper-V, or KVM -- handles all communication between server nodes and all of the services running as a part of NDFS. In other words, the Controller VM both manages the cluster and serves as the central data store for the hypervisor and its guest VMs. &nbsp;

Figure 1: The Nutanix Virtual Computing Platform architecture

Figure 1 above shows the interconnections between some of the key software pieces in the Controller VM. Like node, disk, and network failures, controller failures are detected automatically. NDFS handles controller outages by redirecting I/Os to other Controller VMs in the cluster.

At the center is the Curator, a MapReduce-based cluster management application that handles the distribution of tasks (disk balancing, proactive scrubbing, and so on) throughout the cluster. It's controlled by an elected Curator Master, which serves as the task and job delegation manager.

Stargate is the primary data I/O manager. It communicates using NFS, iSCSI, or SMB and handles all the storage requests from the hypervisor. Medusa is a distributed metadata store based on Apache Cassandra that utilizes the Paxos algorithm to enforce strict consistency across all nodes.

Prism is the management gateway for configuring and monitoring the entire Nutanix cluster. It elects a leader in a similar fashion to the other components. Access to the management system is available via an HTML5-based Web interface, a console-like CLI, and a REST-based API.

Zeus is a cluster configuration manager based on Apache ZooKeeper. Responsibilities of the leader node include the receiving and forwarding of all requests for configuration changes. Should the leader fail, the Zeus services running on the other nodes will elect a new one.

Other components include Chronos for job and task scheduling, Cerebro for handling replication and disaster recovery, and Pithos for managing virtual disk configuration data.

All writes to disk are synchronously replicated before acknowledged to guard against any disk or node failures. The majority of disk write operations funnel through the SSD-based OpLog, which in actuality is a log entry of a disk operation. In effect, the OpLog serves as a very fast persistent store for all disk write operations. For read operations, there's a Content Cache located in local memory and on the SSD. If a specific disk fragment can't be found in the Content Cache, it will be located and retrieved from disk.

Virtual machines running on individual nodes use the resources of that node exclusively, although disk write operations get distributed across the cluster. Guest VMs see the local Controller VM as the central data store for virtual disks; as VMs migrate from node to node, the I/O moves from one Controller VM to another. Thus as VMware's Distributed Resource Scheduler or Microsoft's System Center tools distribute the VM load across the cluster, the storage load is balanced across the Controller VMs. All internode communication takes place over a 10Gb Ethernet network, which means you'll need a 10GbE switch to connect the nodes together.

Nutanix defines a Storage Pool as a group of physical storage devices that may include PCIe SSD, SSD, and rotating disk. Naturally, a Storage Pool may span multiple nodes and will expand when the cluster scales out to include new nodes. A Container is defined as a group of VMs or files and is a logical subset of a Storage Pool. Each Container typically corresponds to a single data store in a VMware environment, for example.

Nutanix managementThe beauty of the Nutanix architecture is that day-to-day operation of the appliance requires little to no management intervention. Once the system has been configured, it should run without any operator input until you need to expand capacity. This typically happens when you add another appliance to the network and need to expand the number of nodes in the cluster. For this scenario, a menu option on the management home page labeled Expand Cluster will lead you through the process of bringing the new system online. Other operator actions might include managing available storage by creating containers and storage pools.

Primary management of a Nutanix appliance takes place from a Web browser. You can also use SSH to open a terminal session on any node and run scripts or manually start and stop services. Many of the settings that control how often different processes run or what triggers specific events reside in configuration parameters called GFlags, which you can set using the browser interface.

Nutanix provides advanced management and monitoring features in addition to the standard HTML5 management pages. These include direct insight into the individual functions, such as the Curator and Stargate. To access these pages, simply type in the URL of the Nutanix Controller VM and add the path to the specific service. Each of the primary functions mentioned earlier has its own page number or exclusive URL. Figure 2 below shows a typical dashboard screen with a default layout of informational widgets. This page is user configurable to include virtually any system level detail you could think of.

Figure 2: The Nutanix dashboard

You'll find plenty of detailed information about the inner workings of NDFS if you dig deep enough. NDFS makes heavy use of logging, and those logs provide insight into key performance parameters. If you don't see the information you're interested in on the dashboard, you only have to add a new chart from a long list of options.

Nutanix uses a RESTful API for its management interface along with plug-ins for VMware vCenter Server and other virtualization management tools. If you're really ambitious, you can write your own code using a language like Python. Using this approach takes about 10 lines of code to get a ton of information about currently running VMs. Inquiring minds can browse the REST API through the main portal page to manually explore the interface, although you probably wouldn't want to do that on a production machine.

Nutanix performanceAll Nutanix products were designed from the ground up with performance and scale as the two driving principles. Data typically passes through the local OpLog with a copy sent over the network to another node in the cluster for redundancy. Sequential writes skip the OpLog and go directly to disk, and they may optionally skip the SSD tier entirely for specific use cases. This reduces the amount of storage needed on the SSD tier while taking advantage of the suitability for sequential writes of HDDs.

Measuring performance on a Nutanix box is something Nutanix does as a matter of course. Since the movement of data to and from the underlying storage is completely controlled by NDFS, it's also possible to monitor and track the moving parts in order to identify any bottlenecks. This snapshot of the vDisk status page shows the different types of performance measurements available.

Nutanix provides a diagnostics tool that will provision a VM per node with six virtual disks attached. Once the VM has been fully provisioned, the tool launches diskperf and fiotool to measure various performance parameters under known loads. Once complete, the results are aggregated to determine the overall cluster performance. A typical user won't run these tools, but they are available for the Nutanix System Engineers to use as a part of the post-installation process.

The Nutanix NX-3000 series of products provide a unique solution for virtualization deployments. Measuring this product against any other competition would be problematic as there really isn't anything like it. It's even tougher if you evaluate strictly on cost, which at a base price of $144,000 per appliance is significant. But the benefits -- high availability, high performance, all the advantages of centralized storage without the overhead -- are compelling. At the end of the day it's an ideal solution for high-end data centers looking to pack as much virtualization capacity into the least amount of space possible.