Information About High Availability

The purpose of High Availability (HA) is to limit the impact of failures—both hardware and software— within a system. The Cisco NX-OS operating system is designed for high availability at the network, system, and service levels.

The following Cisco NX-OS features minimize or prevent traffic disruption in the event of a failure:

Redundancy— redundancy at every aspect of the software architecture.

Isolation of processes— isolation between software components to prevent a failure within one process disrupting other processes.

Restartability—Most system functions and services are isolated so that they can be restarted independently after a failure while other services continue to run. In addition, most system services can perform stateful restarts, which allow the service to resume operations transparently to other services.

Supervisor stateful switchover— Active/standby dual supervisor configuration. State and configuration remain constantly synchronized between two Virtual Supervisor Modules (VSMs) to provide seamless and statefu1 switchover in the event of a VSM failure.

The Cisco Nexus 1000V system is made up of the following:

Virtual Ethernet Modules (VEMs) running within virtualization servers. These are represented as modules within the VSM.

A remote management component, for example. VMware vCenter Server.

One or two VSMs running within Virtual Machines (VMs)

System-Level High Availability

The Cisco Nexus 1000V supports redundant VSM virtual machines — a primary and a secondary — running as an HA pair. Dual VSMs operate in an active/standby capacity in which only one of the VSMs is active at any given time, while the other acts as a standby backup. The state and configuration remain constantly synchronized between the two VSMs to provide a statefu1 switchover if the active VSM fails

Network-Level High Availability

The Cisco Nexus 1000V HA at the network level includes port channels and Link Aggregation Control Protocol (LACP). A port channel bundles physical links into a channel group to create a single logical link that provides the aggregate bandwidth of up to eight physical links. If a member port within a port channel fails, the traffic previously carried over the failed link switches to the remaining member ports within the port channel.

Additionally, LACP lets you configure up to 16 interfaces into a port channel. A maximum of eight interfaces can be active, and a maximum of eight interfaces can be placed in a standby state.

Use the show version command to check the software version in both VSMs.

Install the image matching the Active VSM on the standby.

Active-Active detected and resolved

When control and management connectivity between the active and the standby goes down for 6 seconds, the standby VSM transitions to the active state.

Upon restoration of control and management connectivity, both VSMs detect an active-active condition.

1. Once the system detects active-active VSMs, one of the VSM is automatically reloaded based on various parameters like VEMs attached, vCenter connectivity, last configuration time, and last active time.

2. To see any configuration changes that are performed on the rebooted VSM during the active-active condition, execute show system internal active-active remote accounting logs CLI command on the active VSM.

VSM Role Collision

If another VSM is configured/provisioned with the same role (primary or secondary) in the system, the new VSM collides with the existing VSM.

The show system redundancy info command displays the MAC addresses of the VSM(s) that collide with the working VSM.

If the problems exist:

1. Execute show system redundancy status command on the VSM console.

2. Identify the VSM(s) that owns the MAC addresses that are displayed in the output of show system redundancy status command.

3. Move the identified VSM(s) out of the system to stop role collision.

Both VSMs are in active mode.

Network connectivity problems.

Check for control and management VLAN connectivity between the VSM at the upstream and virtual switches.

When the VSM cannot communicate through any of these two interfaces, they will both try to become active.

If network problems exist:

1. From the vSphere client, shut down the VSM, which should be in standby mode.

2. From the vSphere client, bring up the standby VSM after network connectivity is restored.

Different domain IDs in the two VSMs

Check domain value using show system internal redundancy info command.

If needed, update the domain ID and save it to the startup configuration.

Upgrading the domain ID in a dual VSM system must be done following a certain procedure.

– Isolate the VSM with the incorrect domain ID so that it cannot communicate with the other VSM.

– Change the domain ID in the isolated VSM, save configuration, and power off the VSM.

– Reconnect the isolated VSM and power it on.

High Availability Troubleshooting Commands

This section lists commands that can be used troubleshoot problems related to High Availability.