Category: Acropolis

With so many organisations looking to increase their ability to react to business change and continually do more with ever reducing resources, automation is the only way to solve the challenge. Nutanix has pioneered the simplification of traditional datacentre infrastructure from compute, to storage and virtualisation but many customers I speak to ask about the network.

The Dynamic Duo

Network automation appears to the be the “last mile” in their journey to a fully automated datacentre and with the SDN market place rather fragmented it’s tough for organisations to pick a solution which completes the loop.

Many organisations are also embracing a DevOps methodology to improve the processes around development and release management of new and existing applications, ultimately driving their innovation goals – with that comes the requirement to provision infrastructure in rapid time.

The public cloud has provided a great benchmark for witnessing what can be achieved through automation. Let’s face it before AWS came along how long did it take to deploy a virtual machine, on a new network within a new datacentre…..a long time. You’d spend a huge amount of time just ensuring that you had compatible kit, let alone the process of deploying hypervisors, their supporting management infrastructure, provisioning and connecting storage environments etc.….with public cloud that is all abstracted away which enables businesses to move faster.

Nutanix aims to solve the rapid deployment challenge and on-going scaling requirements whilst ensuring that “day 2” operations are also streamlined, just like in the public cloud where the infrastructure building blocks are invisible. To aid in this journey Nutanix have partnered with Mellanox to provide the automation and simplification of “day 2” operations for common network tasks to complete the loop.

Mellanox are a leading supplier of end-to-end Ethernet and InfiniBand intelligent interconnect solutions and services for servers, storage, and hyper-converged infrastructure.

Mellanox switches have a REST-based API called NEO which enables tasks such as VLAN provisioning and trunking on the appropriate ports utilised by the Nutanix nodes. This enables consumers of the Nutanix Enterprise Cloud Platform to forget about VLAN provisioning requests, as these are automatically setup and migrated as VM’s move within the Nutanix infrastructure, ultimately ensuring that applications get access to the appropriate networks to communicate. This enables developers and operations teams to concentrate on delivering real business value and get on with developing the next business defining application!

Here are a couple of video’s walking through the integration. In the first example a VM will be migrated from Node A to Node B, as we automate the configuration of the VLAN on the Mellanox switches VLAN’s are only configured as required – in real-time, rather than trunking all existing VLAN’s on all ports.

In the second example we create a new VM within the Nutanix Prism console, just like the previous example the combination of Prism and NEO take care of the VLAN provisioning task ensuring that the consumer of the Enterprise Cloud Platform can get on with doing just that – consuming it, just like in the public cloud.

Want to see how to install, deploy and update 99 desktops on an Intel NUC running CE while playing a game and getting education on how long a proper cup of tea takes to brew? Of course!

One of the best things I was involved in while at Citrix was seeing the evolution of the flagship product. From Metaframe 1.8 when I started to XenDesktop 7 when I left we always drove towards simplicity for the end user and eventually the admin. It’s this simple idea of taking away complexity and either replacing it with something easy and intuitive or just making once manual tasks automated and invisible.

With XenDesktop 7 Citrix made great steps with Machine Creation Services and right from the start I’ve been a vocal supporter because it fit the beliefs of Citrix so well. Nutanix brings this simplicity to another level by ensuring that not only is MCS easy to deploy but it’s also predictable and scalable – something that it has struggled with – much in the same way as linked clones did with Horizon View.

Over the last week or so I’ve been playing with my new Intel NUC and seeing what our free Community Edition can do. I’ve completed some simple provisioning tests because I was naturally curious as to how quickly a little home lab system can spin up desktops. The speed, as you’ll see below, is rather impressive but in this post I’m going to show you how easy it is to integrate XenDesktop and any Nutanix deployment running our own hypervisor AHV. The steps you see below are identical to how a full production Nutanix cluster would work so let me take you from zero to hero in 16 minutes. You’ll see what components need installing on the broker and how to set up a connection to, in this example, a single Nutanix Community Edition node.

In case you’re interested my NUC is a Skull Canyon model with two SSDs and 32GB RAM and was a lovely present from Intel for Nutanix being so bloody awesome.

For a while now the metrics most infrastructures, including Nutanix, are benchmarked against is IOps – effectively the speed the storage layer can take a write or read request from an application or VM and reply back. Dating back to the (re)birth of SANs when they began running virtual machines and T1 applications this has been the standard for filling out the shit vs excellent spreadsheet that dictates where to spend all your money.

Recently thanks to some education and a bit of online pressure from peers in the industry, synthetic testing with tools like IOmeter have generally been displaced in favour of real-world testing platforms and methodology. Even smarter tools such as Jetstress doesn’t give real world results because it focuses on storage and not the entire solution. Recording and replaying operations to generate genuine load and behaviour is far better. Seeing the impact from the application and platform mean our plucky hero admin can produce a recommendation based on fact rather than fantasy.

Synthetic testing is basically like stuffing a pair of socks down your pants; it gets a lot of attention from superficial types but its only a precursor to disappointment later down the line when things get serious.

In this entry I want to drop into your conscious mind the idea that very soon performance stats will be irrelevant to everyone in the infrastructure business. Everyone. You, me, them, him, her, all of us will look like foolish dinosaurs if we sell our solutions based on thousands of IOps, bandwidth capacity or low latency figures.

“My God tell me more,” I hear (one of) you (mumble with a shrug). Well consider what’s happened in hardware in the last 5ish years just in storage. We’ve gone from caring about how fast disks spin, to what the caching tier runs on, to tiering hot data in SSD and now the wonders of all-flash. All in 5 or so years. Spot a trend? Bit of Moore’s Law happening? You bet, and it’s only going to get quicker, bigger and cheaper. Up next new storage mediums like NVMe and Intel’s 3D XPoint will move the raw performance game on even further, well beyond what 99% of VMs will need. Nutanix’s resident performance secret agent Michael Webster (NPX007) wrote a wonderful blog about the upcoming performance impacts this new hardware will have on networking so I’d encourage you to read it. The grammar is infinitely better for starters.

So when we get to a point, sooner than you think, when a single node could rip through >100,000 IOps with existing generations of Intel CPUs and RAM where does that leave us when evaluating platforms? Not synthetic statistics that’s for sure.

Oooow IO!

By taking away the uncertainty of application performance almost overnight we can start reframe the entire conversation to a handful of areas:

Simplicity

Scalability

Predictability

Insightfulness

Openness

Delight

Over the next few weeks (maybe longer as I’m on annual leave soon) I’m going to try to tackle each one of these in turn because for me the way systems are evaluated is changing and it will only benefit the consumer and the end customer when the industry players take note.

Without outlandish numbers those vendors who prefer their Speedos with extra padding will quickly be exposed.

A few weeks ago I was given the lovely task of attending a meeting at the last minute with no preparation time and a 3 hour drive just after I got back from annual leave. The meeting was only for an hour so I decided to record a short 10 minute video in the morning to take them through what they’d actually be doing on a Nutanix cluster from day to day. Knowing the type of customer I knew there would be no internet connection let alone a 4G signal.

I could have just given a normal powerpoint pitch and sent them back to sleep on a beach (which is where I still wanted to be) but I wanted to keep them awake and also elevate the conversation away from dull stuff like hardware and storage. Usability, simplicity and time to value was the intention here so click below and leave a comment if it made sense to you. No voice over as I’m too cheap to buy a program for my Mac that’ll do it 🙂

Over the last 18 months I’ve seen some amazing innovations come into the Nutanix platform but I’ve only personally seen half of the story. Before I joined we made some staggering strides and I’d like to take you through those today.

Below are some abbreviated entries from all of the release nodes dating back to NOS 2.6 back in January 2013. I’ve highlighted some of the ones I consider to be important milestones in bold but these are open for discussion and I’m probably wrong anyway 🙂

In this short plagiarised post I wanted to illustrate what can be achieved when approaching a problem with a software first mentality and riding the wave of Moore’s Law. While we’ve brought on new hardware models, ditched Fusion-IO cards for SSDs and partnered with Intel to make it all sing a sweet tune the biggest strides have been made in our famous non-disruptive rolling software upgrades. Whether you bought a node this year or two years ago all of these features should be available to you.

The next time your SAN vendor (or any vendor) claims they’re constantly adding value to their customers get them to put together something like this post because it’s only when you look back do you appreciate how much you’ve already accomplished.

NOS 2.6 (January 2013)

Genesis, a new management framework that replaces scripts run from the vMA, which is no longer required.

Support for 2nd-generation Fusion-io cards.

NOS 2.6.3

Nutanix Complete Cluster 2.6.3 supports vSphere 5.1.

NOS 2.6.4

Support for Intel PCIe-SSD cards is available as a factory-installed option.

NOS 3.0 (September 2013)

VM-centric backup and replication

Local and remote backup of VMs.

Bidirectional replication.

Planned and emergency failover from one site to another.

Consistency groups of multiple VMs that have snapshots made at the same time.

Scheduling, retention, and expiration policies for snapshots.

Compression

Inline compression of containers.

Post-process compression of containers with a configurable delay.

Support for NX-3000

Dual 10 GbE network interfaces.

Higher maximum memory configuration.

Intel Sandy Bridge CPUs.

Improved hardware replacement procedures.

CentOS for Controller VM.

Adherence to requirements specified in the U.S. Defense Information Systems Agency (DISA)

Security Technical Information Guides (STIGs).

NOS 3.1 (January 2014)

New entry NX-1000 series platform

New deep storage NX-6000 series platform

New higher performance model in the NX-3050 series.

ESX 5.1 support

Mixed nodes in a cluster

3.5 (December 2014)

New HTML5 based administration interface

Active Directory/LDAP authentication

Introduction of RESTful API

User-configurable policies for frequency and alert-generating events

Expanded alert messages

Support for user-provided SSL certificates

User-manageable SSH keys and Controller VM lock down

SNMPv3 support and Nutanix MIB

Faster display of real-time data

More intuitive nCLI command syntax and enhanced output

Deduplication of guest VM data on the hot tiers (RAM/Flash)

Optimization of linked clones

Container and vDisk space reservations

Compression of remote replication network traffic

Automatically add new disks to single storage pool clusters

Storage Replication Adapter (SRA) for VMware Site Recovery Manager

General availability of KVM hypervisor

Technology preview of Hyper-V

Automated metadata drive replacement

Improved resiliency in cases of node or metadata drive failure

Field installation of replacement nodes

NOS 3.5.1

Support for the new NX-7000 (GPU platform), NX-6020, NX-6060, NX-6080, NX-3060, and NX-3061 models

Support for vSphere 5.5

Analysis dashboard expanded list of monitored metrics

DR dashboard expanded protection domain details

Storage dashboard deduplication summary

Application consistent snapshots

NOS 3.5.2

Support for Windows Server 2012 R2 Hyper-V

Support for application consistent snapshots in a protection domain

Virtual IP address, a single IP address for external access to a cluster

Certificate-based client authentication

Customised banner message in Prism

Enhancements to the Nutanix Prism web console

Expanded alert messages

NOS 3.5.3

Roles based access control using LDAP and Active Directory

Support for hypervisor lock down

Automatic reattachment to the Cassandra ring for replaced nodes

Improvements to the Stargate health monitor to minimize I/O timeouts during rolling upgrades, balance the load among nodes during failover, and facilitate high availability enhancements

Removal of the Avahi software dependency

The Nutanix SRA for VMware SRM supports vSphere 5.1 and 5.5 and SRM 5.1 and 5.5

NCC release 0.4.1

NOS 3.5.4

New entry NX-1020 platform

Volume Shadow Copy Service (VSS) support for Hyper-V hosts

NOS 4.0 (April 2014)

Feature based licensing introduced (Starter, Pro, Ultimate)

Disaster recovery support for Windows Server 2012 R2 Hyper-V

Prism Central introduced to manage and monitor multiple global clusters from one GUI

Over the last couple of months I’ve had my first experiences with Acropolis in the field. Both quite different but they highlighted two important design goals in the product; simplicity of management and machine migration.

Before I begin I want to take you back a few months to talk about Acropolis itself. If you know all about that you can do two things:

I knew you couldn’t resist a bit of Grover but now you’re back I’ll continue.

Over the summer Acropolis gained a lot of happy customers both new and old. In fact some huge customers were already using it since January thanks to a cunning soft release and that continues into our Community Edition too.

The main purpose of Acropolis was to remove the complexity and unnecessary management modern hypervisors have developed and to let customers take a step back and simply ask “what am I trying to achieve?”

It’s an interesting question and one that is often posed when too deeply lost down the rabbit hole. For someone like me who used to spend far too long looking at problems with a proberbial microscope there’s a blossoming clarity in the way we approached these six words. The journey inside Nutanix to Acropolis was achieved by asking our own question:

“For hypervisors, if you had to start again, what would you better and what would you address first?”

Our goal was to make deploying an entire virtual environment, regardless of your background and skill set, intuitive and consumable. Our underlying goal for everything we do is simplicity and while we’ve achieved this with storage many years ago (which we call as our ‘distributed storage fabric’) the hypervisor was the next logical area to improve.

Developing our own management layer and beginning its work on top of our own hypervisor was a logical step and that’s what brought us to where we are today with the Acropolis Hypervisor. You can see a great video walk through of the experience of setting up VMs and virtual networks in this video.

Anyway on to my first customer story.

Back in summer I spent time working with manufacturing company on their first virtualisation project. They were an entirely physical setup using some reasonably modern servers and storage but due to many reasons they’d put off moving to a virtual platform for many years. One of the most glaring reasons was one I hear a lot here as well as in my previous role at Citrix; “it worked yesterday just fine so why change?” While this is true I could still be walking two miles to the local river to beat my clothes against rocks to clean them. But I chose to throw them in a basket and (probably by magic) they get cleaned. If my girlfriend is reading this, it could be my last blog…

Part of the resistance is related to human apathy but their main concern was having to relearn new skills, which takes focus and resources away from their business, and it simply being too time consuming. I completely agreed. They wanted simplicity. They needed Acropolis.

Now, I could have done what many would and do a presentation, demo and finishing Q&A but I chose to handle our meeting slightly differently. To allay their fears I let them work out how to create a network and create a new VM. As we went I took them through the concepts of what a vCPU was and how it related to what they wanted to achieve for the business. If someone with no virtualisation experience can use Acropolis without any training there can’t be any better sign off on its simplicity. We were in somewhat of a competitive situation as well where ‘the others’ were pushing vCenter for all the management. The comparison between the two was quite clear and while I’ll freely admit that feature to feature vSphere as many more strings to its bow, that wasn’t what the customer needed and isn’t the approach we are taking with the development of Acropolis. We had no wish to just make a better horse and cart and the customer was extremely grateful for that.

One happy customer done, one to go…

Our second customer story, dear reader (because there is only one of you), was already a virtualisation veteran and had been using ESXi for a few years before they decided to renew their rather old hardware and hopefully do something different with their infrastructure. Their existing partner, who’d been implementing traditional three-tier platforms previous to this chose to put Nutanix in front of them and see if we could ease their burden on management overhead, performance and operating expenditure.

While the simplicity of Acropolis was a great win for them and made up most of their decision it was how we migrated their ESXi VMs on to Acropolis that really struck me most and that’s what I’m going to summarise now.

This was my first V2V migration so I needed something simple as much as the customer and partner did and wow did we deliver. Here is everything we needed to do to migrate:

Setup the Nutanix cluster and first container

Whitelist the vSphere hosts in Prism

Mount the Nutanix container on the existing vSphere hosts

Copy the VM to the Nutanix container

Create a new VM is Prism and select Clone from NDFS then pick the cloned disk from step 4

Start the VM and connect to the console

Strip out the VMware tools

Install the VirtIO drivers

Go to 4 until all other VMs are done

Now of course doing a V2V also has a few extra parts such as ensuring any interdependent services are migrated as a group but really that’s all you need to do.

The clever bit is the Image Service. This is a rather smart subset of tools that convert disks like the vmdk in this example to ones used by Acropolis. There’s no requirement for any other steps or management to get a VM across and the customer had their entire estate completed in an afternoon. To me, that’s pretty damn impressive.

I’m really pleased with what engineering have done in such a short period of time and to think where this can go is quite amazing.

And now we come to the point explaining why I said this stuff was “idiot proof.” I can only describe what happened as an organic fault in the system also known as a cock-up on my part. I hold my hands up and say I was a dumb-dumb. As HR don’t read this, and to be honest it’s just you and I anyway, I should be ok.

While we were preparing the cluster for the VM migrations I decided to upgrade the Nutanix software to the latest version and while this was progressing smoothly node by node I somehow managed to…erm…hmm…well……I sort of sent a ctrl+alt+del to the IPMI console. Call it brain fade. This obviously rebooted the very host it was upgrading right in the middle of the operation. After a lot of muttering and baritone swearing I waited for the node to come back up to see what mess I had created…

Here’s where engineering and all our architects need a huge pat on the back. All I had to do was restart genesis on the node and the upgrade continued. What makes this even more amazing is that while I was mashing the keyboard to self destruction the partner was already migrating VMs – during my screw up the migration was already in progress! If I’d have done this to any other non-Nutanix system on the planet it would have been nothing short of catastrophic. However, in this case there was no disruption, downtime and if I hadn’t let off a few choice words at myself nobody would have known. That is frankly amazing to me and shows just how good we’ve designed our architecture.

So how can I summarise Acropolis? It (and Nutanix) isn’t just a consumer-grade infrastructure, it’s also idiot proof and I for one am very grateful for it 🙂

About NutanixNoob.com

This site is written by David Gaunt, senior systems engineer at Nutanix.
David has been in the virtualisation arena since 2001 and worked at Citrix for 13 years and although Nutanix pays the bills all content on this site only contains his thoughts and ideas.