Link-state tracking, VMware ESX and You

This post could also be titled “How to build a healthy, long-lasting relationship with your system administration team”. One of the most important (and overlooked) pieces of deploying VMware ESX in a network is handling an upstream network failure. Because larger organizations have segregated network and system administration teams, the switchport tends to be the demarcation of responsibility. Where this particularly fails is in the perceived reaction of a network component failure, be it an upstream switch or router.

With the increased push towards server consolidation and deployment of VMware, the “routed is better” mantra has become muted by the layer 2 requirements of virtual machine mobility. A virtualized server also can present cable density issues with each server possibly needing 6 NICs (2 x Production, 2 x VMKernal, 1 x Backup, 1 x iLO). From a network design perspective, a VMware deployment screams for a top of rack switching model. Top of rack switching and VMware ESX physical NIC (pNIC) failure detection methods can present some interesting challenges.

VMware ESX allows for two options to detect a upstream network failure: Beaconing Probing and Link Status. Here is an in-depth summary on both methods:

Basically, beacon probing is pretty awful if you’re a network admin. It will send broadcasts out each physical interface of the ESX server for EACH vlan configured (if using dot1q tagging which you should be). So that is:

p number of physical servers x n number of pNICs per server x v number of vlans = broadcast storm

Link status is the preferred failure detection method but it will only track the state of the local link (between the ESX server and the switch). This tells the ESX server nothing about the switch’s ability to forward frames. This is where link-state tracking comes in. Link-state tracking will convey the switch’s upstream link-state to the local link of the ESX server by creating a logic gate between upstream and downstream links.

Suppose you have the following loop-free network topology deployed in your data center:

The network detection failure method configured on the ESX server is link status. Most likely your ESX server is sending frames out both interfaces due to the particular load balancing configuration but in this case we are only interested in frames sent to the switch on the left. In the event the left switch’s uplink fails, we will experience a black hole situation for some of our traffic leaving the ESX server:

By utilizing link status as our ESX failure method detection, the ESX server merely tracks physical link state at layer 1 and the ability of the upstream switch to forward frames is not taken into account:

Link-state tracking configured on the switch will convey this uplink failure to the link directly connected to the ESX server. Let’s get our switch configured correctly (which is stupidly simple):

First, define your link state group globally:

Switch(config)#link state track 1

Then define your upstream links within the link state group:

interface GigabitEthernet1/0/1

link state group 1 upstream

Lastly, define your downstream links:

interface GigabitEthernet1/0/2

link state group 1 downstream

Now the upstream link state will be conveyed to the downstream links which will cause the link to the ESX server to be shutdown in the event the upstream switch link goes down. Interfaces are coupled in :

Once the upstream link failure occurs and the interface is marked as down, the resulting action created by link state tracking is to bring down all downstream interfaces:

By bringing down the physical state of the interfaces to the ESX servers, the action by ESX link status tracking will be to initiate a pNIC failover event:

This will in turn create an long and happy relationship between network and system administrators and eliminate another instance of finger pointing when redundancy fails to function correctly.