In this post, I’ll contrast and compare the different management and provisioning path architectures between Citrix on Nutanix AHV using Machine Creation Services (MCS) and two leading VMware Horizon options. While there is always numerous options within deployments the examples here will be based on the best and leading alternatives. I’ve prepared a 5,000 and 25,000 user examples to illustrate how a common sized environment would look versus one at a larger scale. This will display the difference in how things scale and whether complexity increases or remains low.

The reason to look at this is to help understand how failures, patching, upgrades and human error might affect the resiliency of the provisioning path and management interface. If the control plane is down for the underlying hypervisor the VDI broker layer will not be able to provision or manage the desktop VMs. This can have serious implications for users as they may be unable to access resources if they are disconnected or logoff and when they return there are not enough available desktops due to a control plane issue.

On the operations side, this is an important discussion also, because organizations demand simplicity in architectures. They do not want solutions that are complex to set up and maintain. So I will also look at how many management interfaces the alternatives impose on admins and point any areas of concern.

Citrix + AHV 5,000 User Example

In the first example, we are looking at 5,000 XenDesktop users deployed on Nutanix AHV hypervisor. XenDesktop communicates directly to the AHV cluster via the Prism cluster IP address and utilized API calls to perform actions. Prism is the distributed management interface and runs as a service in the Nutanix controller VM (CVM) on each node. This means that Prism is always available during upgrades and should a node, CVM or a service fail one of the other nodes will accept incoming connections to Prism and API calls.

In the sample diagram below I’m showing XenDesktop connecting to a single AHV cluster running all 5,000 desktop VMs. This is to showcase the power and flexibility that AHV and Prism provide. AHV does not have a maximum cluster size limit like legacy hypervisors impose. With Prism running on every node in the cluster the management and provisioning operations for VMs and the cluster scale out linearly with the cluster. This means that there is no difference in the performance of provisioning or management operations whether a cluster is 3 nodes or 80 nodes. This allows architects to design for large clusters when applicable without any concerns over imposed cluster size limitations.

Should there be valid reasons the 5,000 desktops could be split into more than one cluster. Reasons for doing so might be workloads that don’t mix well or adversely affect desktop density or the desire to divide into distinct failure domains.

Pros:

No Single Point of Failure (SPOF) for provisioning or management

Node or VM counts do not limit cluster sizes

Linear performance of control plane

Highly available control plane and provisioning path

Simple architecture that easy to deploy, manage and operate

Cons:

VMware Horizon does not support AHV

VMware Horizon 5,000 user example

In this first VMware Horizon example, we are looking at the classic way of deploying vCenter server. This scenario does not matter if you deploy Windows or vCenter appliance variations. In this classic method vCenter is a single point of failure (SPOF). This means that the environment can be severely impacted during upgrades and failures that take vCenter offline for more than a few minutes.

Another significant constraint to call out is that VMware does not recommend building blocks of infrastructure that host more than 2000 desktops. This means that each block will consist of a vCenter server and one or more vSphere clusters. In our 5,000 user example, this architecture forces us to have 3 vCenters and the number of clusters below them is open to how the architect wants to design based on requirements. By limiting the scale of each vCenter, VMware is keeping the performance and responsiveness within acceptable limits. But this approach, when scaled becomes inefficient because you are using additional resources and the number of items to manage and update continues to scale as you add users.

Pros:

Fairly simple to deploy and is well understood after long VMware history

Widely supported by applications

Cons:

vCenter is Single Point of Failure (SPOF)

vCenter is limiting factor of 2,000 desktops per vCenter

VMware composer SPOF for linked clone provisioning

VMware Horizon 5,000 user example w/vCenter HA

This example is just an alternative to the previous one in that I’ve inserted the new vCenter High Availability (HA) option that was released in vSphere 6.5 recently. The vCenter Server Appliance (vCSA) must be utilized if you want to use this HA option. The sizing and architectures are the same. The primary difference is the availability of vCenter in this alternative. To deploy the vCenter HA config you are required to deploy 3 vCSA VMs for each vCenter that you want to be highly available. There will be an active, passive and witness VM in each deployment. Multiply this out with the three blocks required to deliver 5,000 users and we now have nine vCenter appliances to deploy, manage and upgrade.

This adds a lot of complexity to the architecture for the benefit of increasing the resiliency of the provisioning path and management plane.

vCenter is still limiting factor for 2,000 desktop VM limit per vCenter

As design scales complexity increases by having so many management points

Citrix + AHV 25,000 User Example

In this and the following examples, I have now scaled the number of users to 25,000 to see what effects this has on the different architectures and management experience. For the Citrix and AHV architecture, nothing changes here other than the number of users. Citrix can accommodate the large number of users within a single deployment. On the AHV cluster side of things, I have elected to evenly divide the users between four different clusters. I could have chosen a single cluster but that felt extreme, architects can also choose more clusters if that meshes with their requirements. Within Citrix Studio, each AHV cluster will be configured as an endpoint that can be provisioned against.

The point is that in the architecture organizations can accommodate large numbers of users with a small number of clusters of which all benefit from highly available provisioning and management controls. Each AHV cluster can be managed via the Prism interface built into the cluster or a Prism Central can be deployed to allow for global management and report. An important thing to note is that Prism Central is not in the provisioning path so does not have any effect on our architecture explained earlier.

Pros:

No cluster size limits provides flexibility to account for budget savings and ability to meet requirements.

Highly available architecture at all levels with simplicity baked in.

Small number of clusters reduces node counts by saving on the number of HA nodes for additional clusters.

VMware Horizon 25,000 User Example

Now taking a look at the expanded user environment with VMware Horizon architecture you can see that I’m showing the vCenter HA alternative. I think that if you have the option for a highly available control plane most will select that option so I’m not showing the classic single vCenter option.

The architecture is the same but you will notice a few things now that the user count has been scaled up to 25,000. We can no longer deliver that many users from a single Horizon installation (Pod). The maximum users within a pod are 10,000 so we now require three Horizon installs to meet our user counts. To be honest having three Horizon pods does affect the broker management experience but in this scenario has really no bearing on the cluster count or design.

Following the 2,000 users per vCenter rule we will need 13 vCenters to meet our 25,000 user requirement. To keep things clean the diagram shows just a single cluster attached to each vCenter but the 2,000 users could be split between a few clusters under each vCenter if that made sense.

You can see from the diagram that deploying 13 vCenters in HA configuration requires 39 vCenter appliances to be deployed and configured. Yes that’s right, Thirty-nine!! Just think about the complexity this adds to troubleshooting and upgrades. Each one of those appliances must be upgraded individually and within a short window to not break functionality or support. Upgrades now may force you to upgrade Horizon, Horizon agents, clients, vCenter and vSphere all within a single weekend. That’s a lot of work, best you could do is do one of the pods per weekend and now you’re exposing your staff to three weeks of overtime and loss of their weekends.

vCenter is still limiting factor for 2,000 desktop VM limit per vCenter

Three vCenter linked mode view to see entire infrastructure view

Three different Horizon management consoles to configure and control users

Composer is an SPOF for linked clone provisioning per Horizon Pod

VMware Horizon 25,000 users on VxRail

In this last example, we are going to adjust the previous example and look at what would change if it was deployed on VxRail appliances that utilize VSAN for storage. The Horizon and vCenter / vSphere architecture would be the same the only thing to highlight is what is added by VxRail.

Each of the clusters that provide resources for each 2,000 user block would be a VxRail cluster. These clusters have a VxRail virtual appliance VM that runs on it and is used for appliance management and upgrades. Given this scale, we now see that each of the 13 clusters will have its own dedicated VxRail manager and does not offer a global management function that Prism Central offers. VxRail manager is not in the provisioning path, but does add to the complexity of managing this type of deployment and should be considered before selecting.

Pros:

Same as previous example

Cons:

Same as previous example

13 Different VxRail managers adds needed complexity

VxRail is an SPOF as a single VM running on each cluster for management operations

Conclusion

Just to wrap up my thoughts and examples here is that whether you’re designing a small or large scale VDI environment it’s important to understand how the management and provisioning structures will function. These are important to how highly available the solution is and what level of effort will be required to support it from day 2 and on. The resiliency and simplicity that Citrix offers when connect to Nutanix AHV cannot be rivaled by any other alternatives today.

Share this:

Like this:

The use of LoginVSI as a VDI performance testing or validation tool has increased over the last several years. It’s really the only tool to offer these services from an independent party, so by default it’s the defacto option for vendors to showcase their solutions. Vendors use LoginVSI on a regular basis to showcase how their solution meets a common set of tests which make them a candidate to be considered for your VDI projects.

To learn the basics on how to understand the results from LoginVSI tests you can refer to a post on their blog here. It’s an older post but still pretty valid since the data points have not changed that much over the versions. The danger enters when you are looking to take testing results from multiple vendors and compare them. You simply cannot take results from different tests and compare the data points without understanding how the testing was done, what was tested and how the differences in the tests affect the results. Also what should you be aware of that might affect the results?

So in this post, I will lay out a number of items to help educate on how to better understand, compare and interpret these LoginVSI results that are published. Because while someone may be publishing a very low VSIbase number and/or high densities you need to be able to determine whether that means anything to your environment and if it’s really valid to anyone.

VDI Brokers

I think most people understand that comparing results from a Citrix test to a VMware Horizon test is not apples to apples. There is a certain amount of overlap that can be accounted for but to be fair you should be comparing tests with the same data points. Then there is different types of desktops, the whole persistent versus non-persistent discussion and how these apply to each other. Both VMware and Citrix each offer two different types of non-persistent provisioning options within their products now, so comparing results gets even more fuzzy. I don’t see many vendors running LoginVSI testing using persistent desktops, so that should not be much of a concern. But there are significant differences in how the different non-persistent provisioning options work that should be aware of when interpreting results.

The version of the Citrix or VMware broker should be the same or very close in the different tests that you are comparing. Along with the many provisioning options explained below the version of the broker could affect the test results if one revision provided performance improvements that another did not provide.

Citrix

Citrix offers two different provisioning methods for non-persistent desktops, which will be the focus for the majority of tests that you will encounter. Some vendors may provide results for both. The different options are Machine Creation Services (MCS) and Provisioning Services (PVS). In short MCS is a storage based architecture while PVS is heavily networked based using centralized caching points. Each option is explored a bit deeper in the following sections.

PVS

The PVS architecture is unique to Citrix and typically uses multiple PVS servers that are load balanced. The golden image for the VDI pool is loaded onto the PVS servers and presented as read-only. These PVS servers use memory within the server OS as a caching layering that allows commonly accessed blocks to be quickly returned to guests improving performance. Each VDI/SBC virtual machine is referred to as a PVS target and is a VM with no persistent data or OS installed. These PVS targets boot the golden image via a network connection to one of the PVS servers.

The writes for each PVS target can be cache in a number of different ways. Each method offers its pros and cons and makes comparing results invalid if the tests are run using different PVS write caching methods.

Cache in device RAM – Each PVS target (VDI VM) will be assigned additional memory above what your OS/image requirements are and this is memory from the physical host supplying resources to all VMs running on it. The writes for each VM will go into the assigned RAM for that target and be persisted until the session is finished.

Cache on device hard disk – In this option, the writes for each VM are stored on a local hard disk for each VM and this is typically the same storage that is being used for running all VDI virtual machines.

Cache in device RAM with overflow on hard disk – This last option is a combination of the two previous methods. It typically is configured to provide a smaller amount of RAM for caching and if writes exceed this amount during a session it will begin to use hard disk for the overflow.

** When interpreting the results you cannot fairly compare a test that uses PVS with RAM cache versus a test that uses PVS with disk cache.

** It would also be incorrect to compare a test that uses PVS with any caching method to a test that uses MCS with any caching method.

** A point to question is why any modern hybrid or all-flash storage vendor would utilize PVS with RAM cache to showcase their storage solution. This virtually removes the storage solution from the testing and does not validate their solution. PVS is a legacy solution that was designed to hide the poor performance of legacy storage arrays.

** If a vendor tested with PVS and does not specifically explain how the write cache was setup and configured, you should be suspicious and request further details. If there are performance charts look for write performance, if it’s low or near zero they are using RAM cache.

MCS

The MCS architecture takes a storage-focused approach to provisioning non-persistent desktops. The golden image is a shared VM that all of the virtual desktops in a pool with boot from. This golden image is read only and all read requests are provided by the storage that it’s sitting on. This is different than PVS in that there are no PVS servers that provide read requests that are cached in memory.

Until XenDesktop 7.9 all reads and writes from these desktop virtual machines were serviced by the storage they were running on. In 7.9 Citrix introduced the Cache in device RAM and Cache in RAM with overflow options for MCS also, that will use the host memory as a caching layer for writes. This now allows the same write caching options between PVS and MCS with the main difference being where the reads are serviced for the golden image.

** When interpreting the results you cannot fairly compare a test that uses MCS with RAM cache versus a test that uses MCS with disk cache.

** It would also be incorrect to compare a test that uses MCS with any caching method to a test that uses PVS with any caching method.

** A point to question is why any modern hybrid or all-flash storage vendor would utilize MCS with RAM cache to showcase their storage solution. This removes the storage solution from services all or most of the write traffic and does not validate their solution.

VMware

VMware also offers two different provisioning methods for non-persistent desktops, which will be the focus for the majority of tests that you will encounter. The different options are Linked Clones and Instant Clones. As of 2016, I don’t think you will see any vendors test anything but linked clones as instant clones are a new technology that is still maturing. In the future, I would expect that many vendors will begin to provide results for both. In short, linked clones is a storage based architecture while instant clones is a new method that removes several of the large storage spikes. Each option is explored a bit deeper in the following sections.

Linked Clones

The linked clone provisioning method from VMware is very similar to the MCS option explained above from Citrix, but without the different write caching options. Linked clones use a golden image or replica that all read operations for the desktop pool are serviced from. Each desktop virtual machine has a delta disk and this is where all write operations are performed. This makes linked clones a provisioning method that is heavily affected by the performance of the storage platform used in your design.

There is one caching alternative available for linked clones that is called the View storage accelerator or Content Based Read Cache (CBRC). This can utilize up to 2GB of host memory to cache commonly accessed bits from the replica image for read operations.

** Take note that if testing was done using the storage accelerator as it removes some of the read operations from the storage system and cannot be fairly compared to another test that does not use the same feature.

** Likewise if you do compare results of tests and the vendor that does not use storage accelerator is able to provide better results than a vendor that does use it is something to be aware of.

Instant Clones

The instant clone architecture is somewhat like a modernized version of linked clones. The philosophy is similar but rather than each pool using a single replica image and pulling and reads from that single image on storage, instant clones creates a replica VM for the pool image on every host. The replica on each host has the OS booted and then placed in a stunned state. This allows each new virtual desktop when created to use the state of the replica as its starting point without the need to initially boot up the OS. This approach saves time and reduces storage peaks during provisioning and image update procedures.

For these reasons, it would not be a fair to compare a test that used instant clones to one that used linked clones. The different provision methods can dramatically affect the provisioning times and I/O behavior during steady states.

Windows Versions

The version of Windows is important when comparing test results. While Windows version may not be as impactful as some of the other points discussed in this post, it is still something that must be taken seriously. I think that most will agree that Windows 7 or 10 are the primary Client OS versions that are already deployed or being deployed today. Deploying Windows 10 will result in about a 10-20% reduction in user density.

Office Versions

This is important for when you are sizing your own environment, but since we are talking about LoginVSI testing is also very relevant to this discussion also. Different versions of MS Office can dramatically affect the performance and density of tests. You can read more about the effects of different office versions on RDSH/VDI user densities here, to save you the time reading I will summarize. Office 2010 currently offers the best user density of Office versions that are widely deployed still (although no longer in mainstream support). Using Office 2013 will result in a 20% reduction in density when compared to 2010. Office 2016 further reduces the density 5% lower than 2013 or 25% less than 2010.

As you can see that with the effects Office can have on user densities it would not be accurate to compare tests that used different Office versions. Lets look at the following scenario, the vendor you prefer has published a report that meets all of your requirements.

Vendor fulfills all your requirements

VSIbase is attractive

Density is 20% lower than other tests

But vendor is using Office 2016 in their testing

Same OS versions, same provisioning methods used

Based on these points I would be comfortable given that all other testing points match, the lower density can be accounted for in different Office versions.

CPU Generations

Just like it would not be correct to compare the towing capacities of a truck with a v6 engine to that of a v8 engine, comparing results from tests that use different Intel CPUs is also not apples to apples. In general, you should be comparing test results that showcase the same CPU generation, if that is not a possibility then you can look at results but there would be no way to account for potential differences in results.

Intel Ivy Bridge processors

Intel Haswell v3 processors

Intel Broadwell v4 processors

Each of these CPU generations offers a performance increase over the previous version that affects both consolidation ratios and overall performance. These performance benefits are obvious for the virtual desktops, but if you are running a software-defined storage solution or hyperconverged (HCI) solution these storage layer will also benefit from these CPU improvements.

Memory

Memory can be of impact on your environment, running specific combinations can result in enhanced or worse decreased performance in terms of the speed that is available (1866 vs 2133Mhz is a 20% density difference for example). This drop in memory speed is typically a result of configuring a server with too many DIMM slots populated which lowers the speed. If a configuration you are considering is using more than 512GB of host memory you should check with vendor documentation to understand what will happen to the memory speed for the proposed configuration.

LoginVSI Versions

Like any software vendor LoginVSI makes changes to their software on a regular basis and these changes can affect the testing results. Different versions of the testing software could be using different applications, different testing methodologies or other factors. For these reasons, it is important to make sure that when comparing test results from different tests that they are at least on the same major version release. It would be unfair to compare a testing run using LoginVSI 3.5 to one using version 4.0. It is more acceptable to compare tests run on 4.0 and 4.5 as long as all other factors explained in this post are in alignment.

VM Sizes

When it comes to LoginVSI testing or just any type of VDI testing most people are seeking two primary data points. The first is storage performance which historically was the major pain point in past deployments. The second data point is user density or the number of virtual desktops per host. The reality is that LoginVSI testing is valuable but can in no way be used to tell you exactly what your environment should be sized like.

To size your environment you will need to understand your use cases and their requirements, then combine those details with performance results that were collected from your actual environment. This will provide you with actual data points that can be used for sizing and the LoginVSI results along with a skilled EUC architect can then provide customized sizing for your environment.

What to watch for

I’ve seen this all too often is that vendors will size the VMs that they will use for running their LoginVSI tests with for the bare minimum to pass the test. By providing the minimal amount of CPU and memory to each VM they can try and show a higher density of users to hosts while still passing the test. The danger here is that vendors that do this create a false sense of user density to confuses customers and architects. Saying that you can achieve 300 users per host while using a configuration that is not likely to be deployed into production by 99% of customers is worthless. And in doing so they are either trying a bait and switch method or proposing you dramatically undersize your design. Either of these approaches would get you thrown out of my office if I was the customer.

So when looking at published test results pay close attention to what the size of the desktops used during the testing was. A Windows 7 desktop with 1 vCPU and 1.5GB of memory may get you to pass the test results but for the vast majority of use cases is not going to provide a delightful user experience.

Todays VM averages

With the above discussion on what to look out for in desktop sizing, I thought it would be helpful to level set on what are acceptable sizing in 2016. Since the release of Windows 7 and even more when moving to Windows 8/10 the need for 2 vCPU for each virtual desktop is the new normal. Today when sizing you should default to 2 vCPU and only move down to 1 vCPU or increase past 2 vCPU when you have valid testing data to support the decision.

The default starting point for modern Windows versions should not go below 2GB of memory unless valid testing has been done to support the request. While 2GB is the starting point, I took an informal survey of several EUC experts and the results showed that their current default sizing for VDI is 2 vCPU and 3-4GB of memory. In the end, the amount of memory will depend on the use case requirements and the applications they are using, but these data points help provide some guidance on what is acceptable and what is not when it comes to test results.

Friends don’t let friends deploy 1 vCPU desktops

User Cases / Scenarios

So far I’ve covered a bunch of configuration points and hardware details that can affect test results. One of the last things to be aware of is to ensure that the tests you are comparing are using the same type of user case or scenario. These are commonly referred to as knowledge worker, task worker, developer, etc. These determine the applications and how demanding the workload will be. Obviously a developer use case is far more demanding than a task worker that commonly uses 1 or 2 simple applications. Most tests are focused on the knowledge worker use case.

Scaling out designs

Typically you will see vendors that are testing in a range of 200 to 1000 desktops in a test run. There may be a few tests that use larger quantities but are not as common. The main thing to look for here is does the vendor provide a detailed explanation of how you would scale from the tested amount to the amount that your end state design is projected to be. As an example if I am looking at a test for 1000 desktops, I will need to understand how this vendor would scale and my design would look for the 20,000 desktops that my environment is projected to be.

The default answer of most vendors will probably be it’s just a cookie cutter approach and you can just stamp out the same build as what was tested. This is not good enough and you should press them harder for real answers.

Questions to understand

These are several data points that you should understand when looking at larger designs or how you will scale from a starting amount to your future desired state.

What are the cluster sizing limitations?

Example would be can I only create clusters of 500 or 1000 users as an example, which means I need 20-40 clusters to reach 20,000 desktops.

As you scale how does this affect management story?

Conclusion

If you evaluating platforms for your currently or future VDI/EUC environment then you have probably been looking at LoginVSI results. When going through your normal solution evaluation process be sure that you consider all of the points explained in this post, especially when trying to make sense of testing results. These will help you better understand how things were tested and also whether someone is trying to spin things unfairly in their favor.

Share this:

Like this:

There has been plenty of guessing and leaked stories over the past month about Nutanix acquiring PernixData and today all of these stories can be forgotten and a bunch of new ones can start. These new stories will range from these are great moves to extend the features and lead that Nutanix has, to oh this is terrible for PernixData. All I can say is that I’m personally excited to see the PernixData team joining Nutanix, while some people have left already we wish them the best. I look forward to the many new teammates and working with them on exciting new projects. Our TME team is growing with an addition from the PernixData family as well as many others in different roles.

PernixData

I as well as many others have been a fan of PernixData for several years now. In my previous role, I had a chance to work with the product from its Beta period and as it matured as a solution. I’m personally excited to watch how both teams will merge into one and use the collective brain power to improve the performance of Nutanix platform and FVP. Something that is probably more exciting for me is going to be watching how the Architect analytics solution that Pernix built is used. Being a big fan of great analytics I have lots of ideas on how this could be used with existing and future capabilities being built.

In the press briefing, Nutanix CEO Dheeraj Pandey spoke about some details around what attracted him to PernixData and some thoughts on what is planned.

Server-side storage technologies are going to become 100x faster with Intel’s push towards NVMe and storage-class memory. While we bring applications and data together in same servers, we have to hustle hard to get even closer to the CPU, and yet remain hypervisor-agnostic. SAN arrays, sitting over networks, will look archaic with the mass introduction of 3D-XPoint technologies.

PernixData helps us hug the application even more so. Hyperconvergence remains hypervisor-agnostic, as we place acceleration and data services strategically on the server.

Over the last 4 years, Pernix has built a very strong muscle memory around storage-class memory better than any other software startup that we know of. More importantly, they “see” every IO without compromising on data consistency. That vantage point gives them a unique advantage to pull off online application migration, the 1-click delight that Nutanix has always espoused for.The two teams will work hard to make the acceleration engine and their datacenter analytics product work across multiple hypervisors, making our enterprise cloud operating system portable to multiple customer environments. And that deliberate parity between VMware ESXi and our very own AHV hypervisor will keep us authentic.

Calm.io

I personally have had very little time to dig into what Calm all has to offer and I missed an internal briefing due to some PTO time. But from exploring their website and watching some videos, I’m excited at the possibilities they bring around extending our cloud operations functions for workflows, deployments, scaling and many other important functions as organizations move towards cloud like operation.

The enterprise cloud operating system is not complete without an equal emphasis on 1-click automation and orchestration. Developers increasingly want to meld with ITOps, and want to think top-down about applications rather than infrastructure. While we’ve successfully sold to ITOps and application administrators in the last 4 years, it is time for us to go higher up the value chain, even closer to end-users, i.e., DevOps.

Calm was a Sequoia Capital-funded startup that was thinking deep about applications and visual design of app workflows. Everything in Nutanix has been about top-down design — Prism’s design has humanized infrastructure, the way we’ve thought of extremely mundane things in IT. That is exactly what we found in the Calm team — democratizing orchestration without writing too much code. Best part, its vision is extremely aligned with ours on the hybrid nature of future IT. Their pane of glass helps design and orchestrate applications across AWS, Azure, and on-prem web-scale infrastructure. The lightbulb moment came when we saw how elegantly they had integrated Nutanix into their story. It was more than 8-9 months of courtship, when we went deep to understand Aaditya, Jasnoor, and their brainchild called Calm.io.

With Calm integrated, customers will finally be able to choose the right cloud for the right workload, achieve seamless application mobility while experiencing the same simple, delightful, and consistent experience they have come to respect about Nutanix.

Press Release

Here is the official press release:

Nutanix Announces Two Strategic Acquisitions

PernixData and Calm.io Augment Data and Control Fabrics

of the Nutanix Enterprise Cloud Platform

San Jose, CALIFORNIA – August 29, 2016 – Nutanix, a leader in enterprise cloud computing, today announced that PernixData and Calm.io will join the Nutanix family.

Nutanix has executed a definitive agreement to acquire PernixData, a pioneer in scale-out data acceleration and analytics. The transaction is subject to customary closing conditions. In addition, Nutanix has closed the acquisition of Calm.io, an innovator in DevOps automation.

By adding world-class technology, products and engineering talent, Nutanix can accelerate the delivery of an Enterprise Cloud Platform that rivals the agility, automation and consumer-grade simplicity of the public cloud but with the control, security and attractive long-term economics of on-premises infrastructure. These additions will enable Nutanix to pioneer new software stacks for storage-class memory systems, enhance its Application Mobility Fabric (AMF) with cross-cloud workload migration and bring rich, cloud-inspired orchestration and workflow automation to its Prism management software.

The New Data Fabric for the post-Flash Era

Nutanix and PernixData share an architectural design philosophy that next-generation datacenter fabrics must keep data and applications close in order to drive the fastest possible performance and to deliver flexible, cost-effective infrastructure scaling. With this common vision, the two companies will develop an advanced data stack to replace traditional storage silos and high-latency networks with newer storage-class memory and advanced interconnects. These planned strategic investments in new server and storage technologies will provide customers with a re-imagined data fabric for a post-flash era of enterprise computing.

The combined teams will also focus on reducing the inertia of application data that inhibits workload mobility across virtual and cloud environments. Planned enhancements to Nutanix App Mobility Fabric (AMF) will deliver the flexibility to run any application in any environment, without business-critical data being held hostage to a legacy infrastructure.

“PernixData software has helped hundreds of customers virtualize their applications without compromising performance and visibility,” said Poojan Kumar, CEO and co-founder, PernixData. “With highly aligned cultures, ambition and talent, we are genuinely excited to join the Nutanix team. And, with our common devotion to 100% software-driven solutions, will look forward to helping customers accelerate their journey to the Nutanix Enterprise Cloud Platform.”

Unleash DevOps in the Enterprise Cloud

The Calm.io and Nutanix teams will work to bring an application-first approach to choosing, managing and consuming IT infrastructure – enabling customers to pick the right cloud for each application. Nutanix plans to add cloud automation and management capabilities to its existing software stack to deliver application and service orchestration, runtime lifecycle management, policy-based governance, comprehensive reporting and auditing services to support all application environments, including virtual machines, containers and microservices. Together, Calm.io and Nutanix plan to bring together clouds, platforms and people, on an elegantly simple pane-of-glass.

“We have shared a similar vision as Nutanix since day one – datacenter infrastructure must be fully automated, simple to deploy and easy-to-use,” said Aaditya Sood, Calm.io CEO and founder, “We are excited to join the Nutanix team to work together to eliminate the daunting complexity of legacy datacenters by taking a radical, application-centric view of IT infrastructure.”

“Today is a very special day in Nutanix’s history,” said Dheeraj Pandey, Founder, CEO and Chairman, Nutanix. “PernixData and Calm.io both have exceptional technology, solid engineering teams, and visionary leaders with the ‘Founder’s Mentality’; they have dreamt big and persevered against great odds to build phenomenal products. We are honored to welcome them into the Nutanix family, and build the next generation of innovative products and truly helping our customers realize the vision of the Enterprise Cloud.”

PernixData and Calm.io customers can expect further communication in the following weeks.

Share this:

Like this:

Today marks a proud day for Nutanix and our customers. As we further extend our lead in the hyperconverged space, it is now fully supported to deploy the Nutanix platform on Cisco UCS servers. Customers now have an additional hardware option to choose from. The current options are NX on Supermicro hardware, XC on Dell hardware as OEM relation or HX on Lenovo hardware as OEM relationship. Outside of these Nutanix currently offers software-only deployments on Crystal ruggedized hardware, open compute project (OCP) hardware and now Cisco UCS.

The hottest platform in the world, Nutanix on UCS!

Offering the stability, reliability and performance of the Nutanix platform on Cisco UCS has been a regular request from many of our large customers and partners. Customers no longer have to accept other hardware platforms if they are heavily invested in UCS or deploy half-baked or immature HCI solutions that were previously available on UCS.

Nutanix on Cisco UCS

Starting today customers can deploy Nutanix on UCS through a meet our meet in the field process. This allows customers to purchase Cisco UCS servers through their normal channels and maintain their Cisco UCS relationships. The hardware and software will be deployed at the customer’s location using the standard Nutanix procedures. The foundation process has been updated to support UCS hardware.

In this initial phase of UCS support, Nutanix will be supporting the C220 and C240 rack mount servers. There will be two models of the C240 to allow for the use of 2.5″ or 3.5″ drives. Also we support deploying with or without Fabric Interconnects (FI), this allows maximum flexibility. These models and config to order flexibility will cover the vast majority of existing use cases. Nutanix will take first call on all support issues and if determined it’s a hardware issue can open a support case with Cisco for customer via TSAnet. Hardware alerts, we can open CiscoTAC cases via TSAnet for customers.

When deployed with Fabric Interconnects, the foundation process will auto create the necessary identify pools, service profile templates and templates to allow for the normal automated Nutanix deployment process that has been available for years on other hardware platforms.

Misc. Faqs

Here are several more details about the release that I won’t dive into at this time.

Hypervisor Support ESXi 6.0/5.5, AHV and Hyper-V

Regular and self-encrypting drives supported on C240 with 3.5″ drives

Haswell and Broadwell CPU’s supported

Share this:

Like this:

Recently one of my lab switches began to fail, since it was the one that did most of the routing in my setup it was time to reevaluate my home networking design. I could just pick up another layer 3 switch, drop it in and continue to do the same thing as I was. But I’m always looking to do things better and my current setup was using gear from multiple vendors. I was using Meraki for my firewall and Access Points (AP), HP was my 1GbE networking and routing and Quanta for 10GbE networking. This setup worked fine, but

I was using Meraki for my firewall and Access Points (AP), HP was my 1GbE networking and routing and Quanta for 10GbE networking. This setup worked fine, but obviously there was many different touch points, I would have loved to replace the HP switch with one from Meraki but they are pretty expensive so that was out of the question. Also, I don’t like paying the yearly licensing costs to Meraki but had been doing for a few years because I really liked the features.

So this led me to take another look at Ubiquiti for networking gear, I have seen lots of others express their happiness with the products after using them. So rather than paying for more Meraki licenses in 6 months, I choose to invest that future money and a little more to replace most of my network with Ubiquiti gear. I ended up replacing everything but the Quanta switch that does 10GbE networking.

The new network now uses the Security Gateway (SG) as my edge firewall and router for all traffic. The SG connects to the new 1GbE network switch with is POE capable so it will power the new AP that was deployed also. I use 1GbE for older lab servers and some IPMI connections and then have a trunked connection to my Quanta switch that newer lab hosts connect to. With this setup I now can control all networking expect the Quanta from the single Ubiquiti controller that I deployed on a Windows VM in the lab.

While I’m losing a few features that Meraki offered and I used they are things that I can deal with. It’s only been a short period of time but so far I’m pretty happy with the Ubiquiti products and hope they live up to their high praise.

Lessons Learned

I had never used Ubiquiti gear before so there were a few things that I learned while setting up and fighting through some things in the beginning. The first would be to just go ahead and install the Unfi controller software in a VM or an old laptop that will always be on and connected. Install the controller on your laptop is not a great idea if you are not always home and online. The devices hold their configuration but cannot be changed if the controller is not present. You also cannot access the reporting if the controller is not around.

The AP’s are all POE capable which is nice if you do not have power outlets close by where you want to deploy them. They come with an AC adapter or can be powered by a POE capable network switch like the one I purchased. By default the UBNT switch is set to have all ports POE+ enabled, but when I plugged in the AP it would not power up. I tried different cables and nothing worked till I used the AC adapter. After talking to support I found out that you must change the switch port that it’s connected to from POE+ to 24v passive, not sure why this matters but it did the trick. Seems weird that an all Ubiquiti deployment would not power up the AP’s with default settings.

The last weird thing I encountered was that when using my Macbook the performance was not great. It was not obvious when using a browser or even streaming video, but was very obvious when I would RDP to servers in the lab. It would have lots of pauses when click between tabs and apps in the RDP session. If I would keep a ping running to different IPs in the lab I would see random spikes of latency from 15-300ms and a ping that would drop about every 20-30 packets. What was weird is that if I performed the same operations from a PC it worked flawlessly. So off to search the intertubes and I saw that Mac performance on Ubiquiti has been an on and off problem with different firmware versions. There was a bunch of forum posts about the problem and after reading a bunch of them I saw that people were having good luck with running the previous firmware versions on their AP’s.

So I left the controller and switch on the latest firmware versions but downgraded the AP to 3.4.18 and it fixed the Mac performance issues. Immediately after the older firmware was installed and the AP rebooted I had a completely normal experience when performing the same RDP functions.

It’s only been a few days but after working through the issues I’m now pretty happy with my decision to make the switch to Ubqiuti. Now I wish they offered cost-effective 10GbE switches that I could deploy to replace my Quanta and then the setup would be ideal.

Share this:

Like this:

It’s been a long journey over the past year, but I’m proud to announce that the Architecting EUC Solutions book is finally available. The book focuses on helping you develop your design for modern EUC solutions. It touches briefly on the strategy and roadmap phases of these projects also. The chapters are created so that each one covers a different topic and they range from all of the EUC solutions, operations, infrastructure and all parts required for a design.

The content in the book is very vendor neutral and is not a blueprint on how to write a VMware Horizon or Citrix XenDesktop design. Instead, it takes an approach of educating architects on what questions to ask and how to evaluate alternatives. Then apply these to the solution of your choice.

I would like to thank Sean Massey for helping by contributing some content for the book and Kees Baggerman for stepping up as a technical reviewer for the book. I hope that if you read the book, that you enjoy it and it’s able to help you on your design journey. If you are not into EUC but are looking for design related content, you may still find some helpful chapters as there are not that many books on IT architecture.

Share this:

Like this:

When managing virtual machines on a Nutanix Acropolis Hypervisor (AHV) cluster there will be a point when you need to monitor a VM or just out of curiosity. This post is going to focus on explaining what data and charts are available to help admins understand the health and performance of a VMs on AHV.

Much like the managing VMs post in the series the monitoring of VMs will be focused on the VM based view within Prism. I have chosen the table view and a sample is shown in the image below. The table provides a list of the VMs that are presented 10 at a time, you can click through them or use the search field to quickly find the VM you are looking for. The table provides basic details such as VM name, the host the VM is running on and the IP address. It also provides the CPU and memory assigned to each VM and then provides a number of storage related metrics. These stats are all available for each VM running on the cluster. The lower portion of this view provides a number of charts that I will go into next.