Handy VSAN VOBs for creating vCenter Alarms

There have been quite a few questions lately around vCenter Server Alarms for VSAN, one in particular that I have noticed is around individual disk failure for VSAN. Outside of the generic default datastore alarms, there seems to be only two VSAN specific alarms:

I figure there must be other useful alarms that we could create, especially after showing how you can create a vCenter Server Alarm to monitor the VSAN component count threshold based on a particular VSAN VOB. I took a look around and found the following VSAN specific VOBs which could be useful for creating additional vCenter Alarms.

VOB ID

VOB Description

esx.audit.vsan.clustering.enabled

VSAN clustering services have been enabled.

esx.clear.vob.vsan.pdl.online

VSAN device has come online.

esx.clear.vsan.clustering.enabled

VSAN clustering services have now been enabled.

esx.clear.vsan.vsan.network.available

VSAN now has at least one active network configuration.

esx.clear.vsan.vsan.vmknic.ready

A previously reported vmknic now has a valid IP.

esx.problem.vob.vsan.lsom.componentthreshold

VSAN Node: Near node component count limit.

esx.problem.vob.vsan.lsom.diskerror

VSAN device is under permanent error.

esx.problem.vob.vsan.lsom.diskgrouplimit

Failed to create a new disk group.

esx.problem.vob.vsan.lsom.disklimit

Failed to add disk to disk group.

esx.problem.vob.vsan.pdl.offline

VSAN device has gone offline.

esx.problem.vsan.clustering.disabled

VSAN clustering services have been disabled.

esx.problem.vsan.lsom.congestionthreshold

VSAN device Memory/SSD congestion has changed.

esx.problem.vsan.net.not.ready

A vmknic added to VSAN network configuration doesn't have valid IP. Network is not ready.

esx.problem.vsan.net.redundancy.lost

VSAN doesn't haven any redundancy in its network configuration.

esx.problem.vsan.net.redundancy.reduced

VSAN is operating on reduced network redundancy.

esx.problem.vsan.no.network.connectivity

VSAN doesn't have any networking configuration for use.

esx.problem.vsan.vmknic.not.ready

A vmknic added to VSAN network configuration doesn't have valid IP. It will not be in use.

Looking at the list above, the following two VOBs seems like they would be useful for alerting on a disk failure is:

esx.problem.vob.vsan.lsom.diskerror

esx.problem.vob.vsan.pdl.offline

Disclaimer: There are no guarantees that a disk error or failure will automatically trigger these VOBs due to the unknown nature of how a disk may be fail, especially if it is intermittently.

Even though we can not simulate a disk error on a physical disk, we can still do some magic using a Nested VSAN environment. The worse case scenario that you could run into is that one of the disk just goes completely offline. We can simulate a similar behavior in a Nested ESXi environment by removing one of the virtual disks from the Virtual Machine (not deleting it).

To demonstrate the following scenario, here are the steps to create a vCenter Alarm for the following two VOBs:

Step 1 - Create a new vCenter Alarm and give it a name. Select “Hosts” for Monitor and “Specific event occurring …” for Monitor:

Step 2 - Add the following two VOBs above into the Event trigger:

Step 3 - Remove one of the Virtual Disks (SSD/MD) from the Virtual Machine running the Nested ESXi VM.

Step 4 - There are two ways in which you can trigger the alarm. You can either create a new Virtual Machine which will try to write to the Nested ESXi VM in which you remove the Virtual Disk or you can rescan the storage adapter for the Nested ESXi VM. In my environment, I happen to have a VM running on an NFS datastore and I performed a Storage vMotion of the VM onto my VSAN Datastore using the default FTT=1 policy on a three node VSAN Cluster. This immediately triggered the alarm as seen in the screenshots below:

Correction, for changes to a VM, you’ll just get the generic “VmReconfiguredEvent” event. However, within the event there is a configSpec that would give you exactly what changed. It’s not pretty and you would need to do some parsing, but it is possible. In fact, here’s a nice solution from fellow Automation community member Luc Dekens on a PowerCLI script that does this http://www.lucd.info/2009/12/18/events-part-3-auditing-vm-device-changes/

Primary Sidebar

Search this website

Author

William Lam is a Staff Solutions Architect working in the VMware Cloud on AWS team within the Cloud Platform Business Unit (CPBU) at VMware. He focuses on Automation, Integration and Operation of the VMware Software Defined Datacenter (SDDC).