Month: September 2017

Looking at the IT infrastructure at several production sites within my customer’s organization, we quickly noticed IT infrastructure components (mainly compute and storage related) that were not up to par from an availability and performance perspective. The production sites all run local business critical ERP application workloads that are vital to the business processes. After researching and discussing a lot, I proposed my customer a new blueprint. The blueprint consists of a new compute and storage baseline for the site local datacenters. The idea was to create a platform that allows for a higher availability and more performance while reducing costs.

We researched the possibility to step away from the traditional storage arrays and move towards a Hyper Converged Infrastructure (HCI) solution. Because IT is not the main business of the company, we were trying to keep things as simple as possible. We defined several ‘flavors’ to suit each production location to its needs. For example, the small sites will be equipped with a ROBO setup, the medium sites with a single datacenter cluster and the large factories are presented a stretched cluster solution. A stretched cluster setup will allow them to adhere to the stated availability SLA in the event of a large scale outages on the plant for their most important applications that do not offer in-application clustering/resiliency.

Benefits

Since my customer is running VMware solutions in all of its datacenters, VMware vSAN was the perfect fit. It allows the customer to lean on the already in-house VMware knowledge while being able to move towards less FTE for managing the storage backend. Implementing stretched clusters on multiple sites using storage arrays can be a daunting task. And although there are prerequisites, implementing VMware vSAN is implemented fairly easy, even if you opt for a stretched cluster configuration. This allowed for very short time from the moment of receiving hardware to a fully operational vSphere and vSAN cluster. Because the customer is in the process of renewing its IT infra for a number of sites, it really helps to tell the business we can deliver within weeks rather than months.

Using the VMware vSAN ready nodes allowed us to exceed the required storage capacity and performance requirements while being more cost efficient in comparison to traditional storage arrays. As management loves lowered costs, both capex and opex, HCI was the way to go. From a manageability point-of-view, it is a big plus that all VMware datacenters and (vSAN) clusters are managed from a centralized VMware vCenter UI. Another plus was the savings in rack units as those are scarce in some site-local datacenters.

This is a short write-up about why you should consider a certain network topology when adopting scale-out storage technologies in a multi-rack environment. Without going into too much detail, I want to accentuate the need to follow the scalable distributed storage model when it comes to designing your Ethernet storage network. To be honest, it is probably the other way around. The networking experts in this world introduced scalable network architectures, while maintaining consistent and predictable latency, for a long time now. The storage world is just catching up.

Today, we have the ability to create highly scalable distributed storage infrastructures, following Hyper-Converged Infrastructures (HCI) innovations. Because the storage layer is distributed across ESXi hosts, a lot of point-to-point Ethernet connections between ESXi hosts will be utilized for storage I/O’s. Typically, when a distributed storage solution (like VMware vSAN) is adopted, we tend to create a pretty basic layer-2 network. Preferably using 10GbE or more NIC’s, line-rate capable components in a non-blocking network architecture with enough ports to support our current hosts. But once we scale to an extensive number of ESXi hosts and racks, we face challenges on how to facilitate the required network interfaces to connect to our ESXi hosts and how to connect the multiple Top of Rack (ToR) switches to each other. That is where the so-called spine and leaf network architecture comes into play.

Spine-Leaf

Each leaf switch, in a spine-leaf network architecture, connects to every spine switch in the fabric. Using this topology, the connection between two ESXi hosts will always traverse the same number of network hops when the hosts are distributed across multiple racks. Such a network topology provides a predictable latency, thus consistent performance, even though you keep scaling out your virtual datacenter. It is the consistency in performance that makes the spine/leaf network architecture so suitable for distributed storage solutions.

An exemplary logical spine-leaf network architecture is shown in the following diagram:

Modern physical NICs (pNIC) have several offloading capabilities. If you are running VMware NSX, which is using VXLAN, you could benefit from the VXLAN offloading feature. Using VXLAN offloading allows you to use TCP offloading mechanisms like TCP Segment Offload (TSO) and Checksum Segment Offload (CSO) because the pNIC is able to ‘look into’ encapsulated VXLAN packets. That results in lower CPU utilization and a possible performance gain. But how to determine what is actually supported by your pNIC and the used driver in ESXi?

It is recommended to follow these three steps to fully verify if the VXLAN offload feature you are looking for is supported and enabled.

Step 1: Check the support of the pNIC chipsetStep 2: Check the support of the driver moduleStep 3: Check if the driver module needs configuration

The first step is to check the vendor information about the supported features on their pNIC product. Let’s take the combination of a 10GbE Broadcom QLogic 57810 NIC and the VXLAN offload feature as an example. Looking at the datasheet of the QLogic 57810 NIC, it clearly states that VXLAN offloading is supported.