Category Archives: VSAN

After verifying the network connectivity, we can now enable VSAN on the cluster. This is pretty simple, from the vSphere Client, select the cluster and click on Configure as shown below.

Select the options as needed. Enabling Deduplication and Compression is a good thing as this reduces the used space on the vsanDatastore and thus reducing the data that needs to be copied should there be a failure. To configure 2 Node Direct Connect VSAN, you need to select the ‘Configure two hostvSAN Cluster’ and click Next.

Now ensure the VMkernel validation is passed, this is absolutely necessary to proceed further. Click Next

The next screen allows you to select the disks that you want to contribute to Capacity and Cache. In All falsh configuration, the disks are automatically claimed and VSAN does the intelligent selection for you. Usually the small sized disks are used for cache and others for capacity. You can set the disks to Do not Claim if you want them to use as spare in case of failure but it would be good to use all of them as we will have redundancy on the second node.

Now select the witness host and click Next.

In case of physical witness host, depending on the size of your Witness Deployment, you would need to select the disks needed for Capacity and Cache. Since we are using the appliance, we get two disks with sizes 15GB and 10GB. Select the 10GB disk for cache and 15GB for the capacity. Click Next

Review the settings and click Finish.

In the Recent Tasks, you can see the disk groups created and disks being added to the disk groups.

Also depending on the version of VSAN you are on, you should see the vsanDatastore being formatted. Pre 6.6 will have a On Disk Format 3 and 6.6 and above will have a version 5. Version 5 of On-Disk Format enables many features like Encrytion and Per Site Policies for Stretched Clusters.

Now that, we have completed the ESX and witness hosts installations, its time to test the network connectivity and also here is where we do the Witness Traffic Separation. To begin with, enable SSH on all the ESXi hosts and connect to them with putty.

Ensure the below ports are open bidirectional before proceeding further.

Use esxcli vsan network list to display the current vsan setup. As you can see below, we have one VMkernerl interface on each host configured to handle VSAN Traffic as we saw earlier.

Below is the vsan network config on the Witness host.

The next step would be to configure a VMkernel interface on each ESXi host to use for Witness Traffic. This can be only done using the command line. So on each host run the below command. Any VMkernel interface other than one configured with VSAN Traffic can be used for this purpose. You can use Management Network VMkernel interface as well.

esxcli vsan network ipv4 add -i vmkX -T=witness

In my case, i am using vmk1 on esx1 and vmk2 on esx2 host, both are also handling the vMotion Traffic. Now check the network configuration using esxcli vsan network list and see that we now see the witness traffic also in the list.

After all the hardwork, now its time to check the connectivity between data nodes and also the connectivity from data nodes to witness. Use vmkping to exclusively test the connectivity from VMkernel interfaces we configured.

vmkping -I vmkX <target IP>

Also test the connectivity to data nodes from Witness host. You can see that all the networking is fine in my case. Since i have all the hosts including Witness in the same subnet, i dint have to add static routes to the hosts to reach VSAN Traffic. To add static routes to the hosts use the below. In 2 Node Direct node, you would need to add static route to Witness host on data nodes and 2 routes on Witness host to reach data nodes.

Download the Witness Appliance. Deploy the OVF like another machine and as per my lab setup i will be using the esx3.vrobo.lab as the host to deploy the Witness. Also esx3 is not part of the VSAN cluster as you can see from the earlier post.

Below is the summary page of my Witness VM deploy.

Power on the Witness VM and Open Console and configure the management network using DCUI and ensure connectivity is fine.

Add the host to the vCenter; make sure it is not added to the VSAN cluster. If you are deploying Witness as a VM like i did, you should see the witness host in blue color, distinguishing it from the other hosts. Not sure if i mentioned it already, but Witness hosts cannot be used to run any other VMs on it and this is specially packaged OVF designed to handle only Witness Traffic in Stretched Clusters.

Below is the view from my vSphere Client

As you can see below, the Witness Appliance comes with a Portgroup names witnessPg configured with its VMkernel interface configured to handle VSAN Traffic by default. You may need to change the IP address of this interface depending on your network. By default it uses DHCP address but it is preferred to change to static.

Also observe that the MAC addresses on the NICs of Witness VM and the MAC addresses of VMkernel interfaces are same, this is on purpose, to ensure the traffic is not dropped and removes the need to enable promiscous mode.

This finishes the Witness VM deployment. Hope this was informative. Thanks!

As with any other vSphere setup, i am using VMware Workstation to run my entire VSAN Lab. I have installed VCSA 6.5 on my workstation as an appliance and have three ESXi 6.5 hosts installed as workstation VMs. I have connected two ESXi hosts directly using the LAN Segment to mimic the setup of 2 Node VSAN cluster. Also i i have added additional NICS as needed to enable vMotion and configuring redundancy for Management Network. Any configuration works as long as there is proper connectivity and VMkernel interfaces are enabled with VSAN Traffic.

As discussed earlier, the Witness Host cannot be part of same VSAN cluster and canno tbe part of another Stretched or 2 Node VSAN cluster. Special configurations are supported by VMware with RPQ and you may need to get in touch with support for further details.

To make it simple, below is how i have setup my lab.

Add the ESXi hosts to vCenter and configure networking as below and ensure there are VMkernel interfaces enabled with VSAN Traffic on each host. Since we are doing a 2 Node Direct Connect setup, configure the vmkernel interfaces with different subnet. In my case i am using 172.10.10.0 network.

Things have changed a lot with vSphere 6.5 Update 1 and especially with VSAN. You can quickly go through them in my What’s New with vCenter, VSAN 6.5 and VSAN 6.6.1 posts. Many organizations are now running their management clusters on VSAN and VMware has marked a 10K customer mark very recently. With all the useful features coming out of the box, customers planning VSAN clusters are increasing every day.

Many smaller organizations now have their plans to convert their remote branch offices to 2 Node ROBO clusters enabling them to run their infrastructure on the legacy x86 machines and also to be able to make the workloads highly available with the traditional vSphere features like HA, DRS and vMotion. I have already done a series of posts when VSAN 6.2 was new and now i wanted to write some blog posts on the 2 Node VSAN implementation. VMware has a very detailed documentation of the implementation and i would strongly recommend reading them for a better understanding of the technology as a whole.

Hope you have gone through the What’s New sections just so that you are up to speed with the features in 6.6.1. Before jumping into the 2 Node setup, it’s a good idea to go through the Stretched Clusters. So here are few key points i want to mention with respect to the Stretch Cluster and 2 Node configuration.

Stretched Clusters are clusters with two active/active sites contributing equal number of hosts at each site and have a third witness host (physical or appliance) in a different site than the other to Sites.

Stretched Clusters require vSphere 6 Update 1 and higher

Each site can have a maximum of 15 hosts; so the stretched cluster in total has 30 hosts acting as data nodes. Witness host does not contribute any storage and does not have data IO

Minimum configuration in a Stretched Cluster is 1+1+1

A VM in a stretched cluster has its VM objects present at both sites and should there be a failure of host, it uses the data on the other site to run the VM and HA is responsible for this.

Stretched Clusters support both All Flash and Hybrid configuration

Each site acts as a Fault Domain in a Stretched Cluster and hence has 3 Fault Domains; Preferred, Secondary and Witness

Number of Failures to Tolerate in VSAN prior to 6.6. is 1 due to the 3 Fault Domains

In 6.6 the VSAN has a Site level Tolerance included as Secondary Failures to Tolerate; this can be achieved using the Per Site Policies.

2-Node direct Connect is an implementation of Stretched Cluster with two nodes residing in a single site and witness in a third site. 2+1

A 2 Node deployment can be done with nodes connected to a 10G switch or you can also use a crossover cable to connect both nodes, this also reduces the cost of deploying a 10G switch.

Witness Traffic Seperation has been introduced in 6.5 enabling to separate Witness Traffic from the VMKernel interface configured for VSAN Traffic. This is very helpful for 2 Node configurations as the VSAN traffic is configured over cross over cabke and Witness can be configured over Management or other VMKernel interface. Witness Traffic separation can be done only using the commnad line now.

Witness Traffic Separation is not supported in Stretched Clusters

For Stretched Clusters, L2 is recommended for vSAN communication between data sites and L3 for communication between Witness and Data Sites.

VSAN Traffic between data sites is multicast and Witness Traffic is Unicast. In 6.6 everything is unicast, only thing is all the nodes in cluster are to be upgraded to VSAN 6.6

For every 1000 VM objects, there is a witness component and each VM has 9 VM objects minimum and need 2Gbps connection atleast. For vmdk larger than 255GB, each 255GB chunk has a vm object.

In a Stretched Cluster, Hosts in Preferred Site and Secondary Site have heartbeats exchanged every second and a 5 second delay is considered a host failure.

The sixth generation of vSAN, vSAN 6.6 brings in lot of new features and contribute to Higher Security, Lower Costs and Improved Performance. vSAN storage devices are fully integrated with Photon platform API management. Let’s now go through all the new features in this release.

Native Encryption

VSAN now has an optional data-at-rest encryption to improve security. vSAN enryption uses AES 256 cipher and is hardware independent and thus does’nt stop one from using encryption feature on existing hardware and no self encryting devices are needed. However, you would need a Key Management Server(KMS) to enable the encrytion.

A vSAN datastore comprises of both capacity and cache tier, and encryption is performed on the datastore level and thus all the vsan objects are encrypted before they are written to persistent disks and all this awesomeness can be done from web client.

Enabling encrytion can be done on vsanDatastores with or without VMs running, however a rolling reformat is required for this to complete and depending on the amount of data present this may take longer. Disabling encrytion also needs a rolling reformat and takes more time than it took for disabling. Point to keep in mind.

VUM integration

Older versions of vSAN needed a bit of research and calling VMware support sometimes to check the compatibility of hardware with new release and vSphere updates had to be done manually, but now all this can be done using our favourite Update Manager. Health Checks take care of checking the hardware compatibility and also provides suggestions if there are any upgrades available to vSAN. There is no downtime needed and VM’s are migrated off of host and is placed in maintenance mode during the updates.

Health Checks Improvements

Older versions of vSAN required vC and webclient service running to check the Health of the vSAN cluster, now with version 6.6, we can monitor the vSAN health using the native Host Client and any host in the cluster can be used for this.

Health Checks now include verifying the storage device controllers, queue depths and many more, and now Alerting has been enabled for encryption, disk health, network and disk balancing.

vROPS Pack for vSAN

vSAN now seamlessily integrates with vROPS adn hence more insights and recommendations are made by vROPS for the vSAN workloads.

Easy replacing of Witness Host for 2 Node and Stretched Clusters

In case of failure of Witness Host (or host running witness appliance), the amount of time for which the vSAN cluster has to run without a Witness Host is greatly reduced and a new host can be selected using the Change Witness Host option in the Fault Domains section.

Pre-Checks

vSAN is now included with pre-checks which are very useful when evacuating hosts of removing disk groups. This functionality is included into the maintenance mode operation. This greatly reduces the risk.

Unicast All Over

Multicast dependency has been removed in this version of vSAN, however, when upgrading the current vSAN, multicast is needed untill the cluster is on 6.6 and after that vSAN changes to unicast after the upgrade. If the on disk format is not upgraded to 5 and old version of host is added to cluster, multicast is used. If on disk format is 5 and old version of host is added, the newly added host still uses multicast and is seperated logically

Site Affinity

For worloads running on Stretched Clusters and have the built in replication capability, the VM can be configured with Site Affinity reducing the storage used on the other site, this has to be configured using the storage policy

Local Failure Protection

In case of a local failure, we can now configure worloads with storage policy enabling them to have a RAID 1 mirroring or RAID 5/6 coding within a site. The options for this are Primary level of failures to tolerate (across sites) and Secondary level failures to tolerate (within site) and Fault Tolerance methos determines the RAID 1 or RAID 5/6

Intelligent Component Rebuild

In cases where vSAN has to do a copy of data either to create a secondary copy of objects or in case of failure, VSAN network is used for this and can cause throttle, to avoid this in version 6.6, the rebuild waits for an hour to actually copy the data. After 60 minutes, the pre 6.6 versions used to copy the entire data even if the absent components come online . With 6.6 only copy if the components data needs update. Resync operations in prior versions of vSAN is entire controlled my vSAN, this can now be controlled by administrators by adjusting the throughput.

vSAN 6.5 was announced along with the vSphere 6.5 and it has also come with many enhancements and this post is to run you through the most important of them. vSAN is growing day by day and there are around 5500 customers now and that’s awesome. Well, to begin with, Virtual SAN is now vSAN; although you still see some VMware press releases mentioning Virtual SAN, remember that’s gonna go away soon. Let’s now review those ‘new’ things.