This is the short overview of the software powering the Dell XC hardware. I wanted to talk about some technical features since it was a Tech Field Day so I landed on Erasure Coding EC-X and VM Flash Mode. One thing I also like to stress around going into HCI is management. The fact that you are splitting up a traditional storage array into multiple individual parts should not be lost on people. Of all the things the Acropolis software delivers, it’s management is what will allow Nutanix to compete with the cloud providers of the world. In the land of high availability us as humans are the most dangerous thing to happen to hardware, not the components themselves.

It was the first time as a presenter attending Tech Field Day. I had been an attendee twice and a long time follower of the event. I’ve always thought the independent(as much as one can be) guests that Tech Field provides is some of the best in the industry. This event was no different; Forbes, Storage Architects, virtualization experts, exchange gurus and the list goes on. There really isn’t much that gets passed this diverse crew.

Senior Lewie Newcomb, Executive Director, Storage Product Group of Dell, started the show for Dell. Lewie does a great job of telling why hardware matters and how an OEM relationship forges an appliance like Dell XC to bring over and above a software only play. Lewie goes onto comment it’s one of the most successful products he has dealt with at Dell.

Watch the below segment on Dell’s SDS strategy and how the journey started with Nutanix.

When I hear about companies essential betting against Moore’s Law it makes me think of crazy thoughts like me giving up peanut butter or beer. Too crazy right?! Take a gaze at what Intel is currently doing for storage vendors below:

Some of the more interesting ones in the hyper-converged space are detailed below.

Increased number of execution buffers and execution units:
More parallel instructions, more threads all leading to more throughput

CRC instructions:
For data protection, every enterprise vendor should be checking their data for data consistency and prevention against bit rot.

Intel Advanced Vector Extensions for wide integer vector operations(XOR, P+Q)
Nutanix uses this ability to calculate XOR for EC-X (erasure coding). It uses subcycles of a cpu on a bit of data which really helps has Nutanix parallels the operaion against all of the CPU’s in the cluster. Other vendors could use this for RAID 6.

Intel Secure Hash Algorithm Extensions
Nutanix uses these extensions to accelerate SHA-1 fingerprinting for inline and post-process dedupe. Making use of the Intel SHA Extensions on processors is designed to provide a performance increase over current single buffer software implementations using general purpose instructions.

Reduced VM Entry / Exit Latency
This helps the virtual machine, in this case the virtual storage controller it never really has to exit into the virtual machine manager due to a shadowing process. The end result is low latency and the penalty for user space vs kernel is removed from the table. Also happens to be one of the reasons why you can virtualize giant Oracle databases.

Intel VT-d
Improve reliability and security through device isolation using hardware assisted remapping and improved I/O performance and availability by direct assignment of devices. Nutanix directly maps SSDs and HDDs and removes the yo-yo affect of going through the hypervisor.

Intel Quick Assist

Intel has a Quick Assist card that will do full offload for compression and SHA-1 hashes. Guess what? This features on this card will be going on the CPU in the future. Nutanix could use this card today but chooses not to for service ability reasons but you can bet your bottom dollar that we’ll use it once the feature is baked onto the CPU’s.

To top everything else above, the next Intel CPU’s will deliver 44 cores per 2 socket server and 88 threads with hyper-threading!

If you want to watch a full break down of all the features, you can watch this video with Intel at Tech Field Day

Every Nutanix virtual storage controller has local cache to server the workloads running directly on the nodes. A question that comes up is if the local cache should be increased. No ever complained about having too much cache but being a hyper converged appliance we want to keep the RAM available for the running workloads if needed. I would never just recommend giving every controller virtual machine(CVM) 50 GB or 80 GB of RAM and see where that gets you.

The cache on the CVM is automatically adjusted when the RAM of the CVM is increased. I recommend increasing the CVM memory in 2 GB increments and track the effectiveness of the change. Even starting with 16 GB of RAM in a system that has 256 GB of RAM available is only ~6% of the RAM resources available.

Nutanix CVM Resources starting points

Parameter

Configuration

Base

Inline Dedupe

Memory Size

Increase to 16 GB

Increase to 24 GB

Memory Reservation

Increase to 16 GB

Increase to 24 GB

Base (Non-Dedupe)

Go to any CVM IP address and check the startgate diagnostic page http::2009 and use the below guidelines before increasing your RAM on the CVM. You may need to allow access to the 2009 port if you’re accessing the page from a different subnet. This is covered in the setup guide.

Amount of CVM RAM

Extent Cache Hits

Extent Cache Usage

Recommendation

16 GB

70% – 95%

> 3200 MB

Increase CVM RAM to 18 GB

18 GB

70% – 95%

> 4000 MB

Increase CVM RAM to 20 GB

NOTE: Going higher than 20 GB of RAM on a CVM will automatically start allowing RAM to be used for dedupe. If don’t enable dedupe past 20 GB of RAM you will be wasting RAM resources. You can prevent this from happening by the use of GFLAGs. It’s best to contact support on how to limit RAM being used for dedupe if you feel your workload won’t benefit from it.

Dedupe
Using the Prism UI you can assess if more RAM will help hit rate ratio. Cache from Dedupe is referred to as content cache. The content cache spans over RAM and flash. It is possible to have a high hit rate ratio and have little being served from RAM.

In the Analysis section of the UI check to see how much physical RAM is making up the content cache and what your return on it is.

If the memory being saved is over 50% of the physical memory being used and the hit rate ratio is above 90%. You can bump up the CVM Memory.

NOTE: For both extent cache and content cache it is possible to have a low hit rate ratio and high usage of resource and still benefit from more RAM. In a really busy system the workload may be too large and might be getting cycled thru the cache before it can hit a consecutive time. It’s our recommendation to increase the CVM memory if you know your maximum limit for CPU on the host. Available memory can help the running workload instead of sitting idle.

Hopefully this helps in giving some context before changing settings in your environment.

Sometimes scale get contrived as huge or some quantity of capacity in IT that few shops will ever get to. I think big or small all companies need to the ability to scale. The ability to scale allows customers to buy want they need, when they need it and most importantly use it right away. It can be 6 TB or 60 PB, it’s all relative.

The prized gem for Nutanix and it’s ability to scale revolves around Apache Cassandra(NoSQL) and paxos. Nutanix stores it’s metadata in Apache Cassandra There is a good write up on how Paxos works with NoSQL on Nutanix on the Nutanix Bible. I really enjoyed the ending of a recent article, “Next gen NoSQL: The demise of eventual consistency?”

The next generation of commercial distributed databases with strong consistency won’t be as easy to build, but they will be much more powerful than their predecessors. Like the first generation, they will have true shared-nothing distributed architectures, fault tolerance and scalability.

Why did I enjoy it? Because this is what Apache Cassandra (NoSQL) and paxos is giving to Nutanix today. NoSQL is powerful tool for responding to change and combined with paxos all worries go away. NoSQL’s ability not to need a strict schema allows Nutanix respond to change very efficiently in terms of:

Failures – Nutanix Cassandra has self ring healing in 3.5 where the metadata is evenly distributed. If cassandra process on a node is down for more than 5 minutes. Medusa will trigger the process of detaching the node from cassandra ring. Once the node is detached from the ring, we are ready to take another failure and still remain available.

Upgrades – Only that is constant is change! Nutanix is rapidly adding features and our customers can’t afford downtime. I just read a couple of days ago of company adding dedupe to their product line and the upgrade needed planned downtime. NoSQL allowed Nutanix to add SHA1 hashes to the metadata and carry on to provide inline dedupe without downtime.

Scaling – Nutanix can scale compute and storage at different rates with a variety of different nodes. The process is the same for all the nodes. Hit the expand cluster button, enter in three IP’s, add to the compute node to vCenter. You also have the ability to automate the whole thing! Keep in mind this process is the same for ESXi, Hyper-V and KVM.Scaling is the ability to respond to business change.

Quest Software today announced vRanger 6.0, giving the ability to back up and restore physical Windows servers along with virtual. While the product isn’t GA yet good to see one piece of software for both worlds.[Read more…]

Nutanix is all in one solution building block for virtualization. It allows you to virtualize your workloads without requiring a SAN. This approach allows for many benefits, such as buy what you need, when you need it and a reduction in complexity around architecture and operations. I see Nutanix as perfect fit for VDI and Cloud workloads. Where there is uncertainty in the workload and large scale is needed, Nutanix can make a great fit.

Below is how their storage works inside of their 2U building blocks that contain 4 separate nodes. The Name of operating system Nutanix runs on is called HOT(Heat Optimized Tiering). The controller VM is the magic sauce of the operation. All the I\O flows through the controller VM. As data is written to the Fusion-IO card and then is serialized. When the data is cold , it will be laid out to disk in a nice clean format.

The SATA SSD is for ESXi, the Controller VM and VM Swap, nothing else gets to live here.

I am not a converged infrastructure hater. I agree with the value proposition that converged infrastructure brings. VBlock from VCE, FlexPod from the reference architecture of NetApp & Cisco, HP with Cloud Matrix. The afore mentioned converged infrastructure solutions are marketing machines. The big players in this space have strong relationships with their current customer base and are using converged infrastructure as a one stop shopping menu for their gear.[Read more…]

Over the last couple of days of Storage Field Day the conversation of Flash vs Cache has been discussed multiple times. Flash vs Cache is an interesting topic for VDI. Do you want to put your whole work load into Flash or use Flash as a Cache and balance the workload with traditional hard drives?

For the purpose of this article I am only listing the vendors that were at Storage Field Day.

Below is list of vendors that are using Pure Flash for their Storage Arrays:

The vendors that are offering a end to end solution with Flash are trying to bring down the cost of Flash by using techniques like duplication, commodity hardware, build your own drives and will talk about power savings. The Flash as Cache camp talk overall cheaper cost per GB, need for cheap disk and that sequential IO are still better on spinning disk.

If you’re after an clear winner for Flash vs Cache it’s just not the simple. The feature sets between all the different vendors vary quit a lot and have different value propositions. I think it’s important to break down what you need for a VDI solution and make your decision based on that.

Replication – You need the ability get user data and golden desktop images offsite and protected. This doesn’t have to fast disk all.
Need for Speed – Your replica’s and linked clones need to be fast. Today’s end users are getting SSD in their laptops. Comparing people’s 5 year old computers to VDI are coming to a close. Your virtual desktop needs to deliver the best performance, consistently.

User Data – profile data, user documents, shortcuts and other users errata. Doesn’t need to be on fast disk unless your making use of redirection. If you’re copying data onto the desktop from a repository you don’t want this to be the bottle neck.

The Trash – Page files, swap files and temp files. They take up lots of space so either you need lots of disk or way to dedupe the data.

Applications – An array providing SMB\CIFS share can go along way for distributing your applications to the desktops. This data\IO will land on the linked clones for the most part but an active non-persistent environment can cause a heavy load on your distribution method of choice.

Over the three days at Storage Field Day I cam really close at changing my stance on which makes the best option. Both Pure Storage and Nimbus have some good products but I still think you need disk. If you where only going to go with one array vendor for VDI I would have to go with Flash as a Cache option. To have only one array vendor in your overall solution can go along way with troubleshooting and managing your environment.

User data is going to continue to grow and I believe more of unstructured, hard to dedupe data will be apart of that make up. Also lots of data will be at rest and never be touched after it’s created, I believe this lends well to a flash as cache scenario. Having the disk in the system also helps for replication if you want to use the standby array for other uses during the day. The replicated data can sit on the disk while other systems can use the flash.

All of the Full SSD vendors of their own unique value proposition like Nimbus with there ultra low cost drives and full feature set of offerings and Pure with their ultra safe no virtual machine never UN-aligned again and dedupe upfront features but I still think you need the spinning rust.

I am lucky to have been invited back to another Tech Field Day Event, Storage Field Day #1. When I was trying to get time off work either by the way of holiday time or getting work to cover the days my boss asked me, what would the company get out of it? It was kind of a deer in the headlights question because of my love of technology. It was like if someone offered you an IPAD3, you response is yes right away. Storage being a very important part of overall IT infrastructure, I knew I wanted to go. Below is my official response.

This three day event will showcase the latest in storage architecture, provide answers for selecting the right storage based on business requirements and will have a chance to pick the brains of the top independent experts. Running storage from companies like EMC and NetApp is great but usually these industry juggernauts are not able adjust to change as quickly as their start-up brothers. Newer companies and start-ups can provide great insight into the future and direction of the storage market.

The Solid State Storage Symposium on the first day of the event will help address questions if we should use Solid-state storage as cache or a high end tier of storage for our current vendor in the data centre. It will also give insight into an emerging field that is littered with new companies all stating they‘re the best thing since sliced bread. Not all things are recreated equal and therefore many advantages and drawbacks need to be considered before implementation.

For myself, Day 2 and Day 3 of the event are seeing what can help me drive virtualization at our company and speed up deployment for our VDI environment. Users want a reliability system, they don’t care what company is running on the backend but they want the same performance day in and day out, if not faster. What techniques can we take away to help our own time to market with a solution or help reduce our risk footprint? I hope to be able to finds to augment our current environment without breaking the bank.

Storage Field day will also provide networking with industry experts/veterans that will give a chance to see what other people are doing in their perspective industries. See what they’re having success with and what to avoid. Whitepapers are great but seldom offer the one point that will make or break a solution.