Information About High Availability

The purpose of High Availability (HA) is to limit the impact of failures—both hardware and software— within a system. The Cisco NX-OS operating system is designed for high availability at the network, system, and service levels.

The following Cisco NX-OS features minimize or prevent traffic disruption in the event of a failure:

•Redundancy— redundancy at every aspect of the software architecture.

•Isolation of processes— isolation between software components to prevent a failure within one process disrupting other processes.

•Restartability—Most system functions and services are isolated so that they can be restarted independently after a failure while other services continue to run. In addition, most system services can perform stateful restarts, which allow the service to resume operations transparently to other services.

•Supervisor stateful switchover— Active/standby dual supervisor configuration. State and configuration remain constantly synchronized between two Virtual Supervisor Modules (VSMs) to provide seamless and statefu1 switchover in the event of a VSM failure.

The Cisco Nexus 1000V system is made up of the following:

•Virtual Ethernet Modules (VEMs) running within virtualization servers. These are represented as modules within the VSM.

•A remote management component, for example. VMware vCenter Server.

•One or two VSMs running within Virtual Machines (VMs)

System-Level High Availability

The Cisco Nexus 1000V supports redundant VSM virtual machines — a primary and a secondary — running as an HA pair. Dual VSMs operate in an active/standby capacity in which only one of the VSMs is active at any given time, while the other acts as a standby backup. The state and configuration remain constantly synchronized between the two VSMs to provide a statefu1 switchover if the active VSM fails

Single or Dual Supervisors

The Cisco Nexus 1000V system is made up of the following:

•Virtual Ethernet Modules (VEMs) running within virtualization servers (these are represented as modules within the VSM)

•The active VSM runs all the system applications and controls the system.

•On the standby VSM, the applications are started and initialized in standby mode. They are also synchronized and kept up to date with the active VSM in order to maintain the runtime context of "ready to run."

•On a switchover, the standby VSM takes over for the active VSM.

Network-Level High Availability

The Cisco Nexus 1000V HA at the network level includes port channels and Link Aggregation Control Protocol (LACP). A port channel bundles physical links into a channel group to create a single logical link that provides the aggregate bandwidth of up to eight physical links. If a member port within a port channel fails, the traffic previously carried over the failed link switches to the remaining member ports within the port channel.

Additionally, LACP lets you configure up to 16 interfaces into a port channel. A maximum of eight interfaces can be active, and a maximum of eight interfaces can be placed in a standby state.

Problems with High Availability

•Check the role of the two VSMs using the show system redundancy status command.

1. Confirm that the roles are the primary and secondary role, respectively.

2. If needed, use the system redundancy role command to correct the situation.

3. Save the configuration if roles are changed.

Network connectivity problems.

•Check the control and management VLAN connectivity between VSM at the upstream and virtual switches.

If network problems exist:

1. From the vSphere client, shut down the VSM, which should be in standby mode.

2. From the vSphere client, bring up the standby VSM after network connectivity is restored.

The active VSM does not complete synchronization with the standby VSM.

Version mismatch between VSMs.

•Check that primary and secondary VSM are using the same image version using show version command.

If the active and standby VSM software versions differ, reinstall the secondary VSM with the same version used in the primary.

Fatal errors during gsync process.

•Check the gsyncctrl log using the show system internal log sysmgr gsyncctrl command and look for fatal errors.

Reload the standby VSM using the reload modulemodule-number command, where module-number is the module number for the standby VSM.

The standby VSM reboots periodically.

The VSM has connectivity only through the management interface.

•When a VSM is able to communicate through the management interface, but not through the control interface, the active VSM detects the situation and resets the standby VSM to prevent the two VSMs from being in HA mode and out of sync.

•Check the output of the show system internal redundancy info command and verify if the degraded_mode flag is set to true.

Check control VLAN connectivity between the primary and secondary VSMs.

VSMs have different versions.

Enter the debug system internal sysmgr all command and look for the active_verctrl entry that indicates a version mismatch, as the following output shows: