Multicast is not SSO-aware and restarts after switchover; therefore, multicast tables and data structures are cleared upon switchover.

Configuration Mode Restrictions

The configuration registers on both RPs must be set the same for the networking device to behave the same when either RP is rebooted.

During the startup (bulk) synchronization, configuration changes are not allowed. Before making any configuration changes, wait for a message similar to the following:

%HA-5-MODE:Operating mode is sso, configured mode is sso.

Switchover Process Restrictions

If any changes to the fabric configuration happen simultaneously with an RP switchover, the chassis is reset and all line cards are reset.

If the switch is configured for SSO mode, and the active RP fails before the standby is ready to switch over, the switch will recover through a full system reset.

During SSO synchronization between the active and standby RPs, the configured mode will be RPR. After the synchronization is complete, the operating mode will be SSO. If a switchover occurs before the synchronization is complete, the switchover will be in RPR mode.

If a switchover occurs before the bulk synchronization step is complete, the new active RP may be in inconsistent states. The switch will be reloaded in this case.

Switchovers in SSO mode will not cause the reset of any line cards.

Interfaces on the RP itself are not stateful and will experience a reset across switchovers. In particular, the GE interfaces on the RPs are reset across switchovers and do not support SSO.

Any line cards that are not online at the time of a switchover (line cards not in Cisco IOS running state) are reset and reloaded on a switchover.

Information About SSO

SSO Overview

The switch is supports fault resistance by allowing a redundant supervisor engine to take over if the primary supervisor engine fails. Cisco SSO (frequently used with NSF) minimizes the time a network is unavailable to its users following a switchover while continuing to forward IP packets. The switch is supports route processor redundancy (RPR). For more information, see Chapter8, “Route Processor Redundancy (RPR)”

SSO is particularly useful at the network edge. Traditionally, core routers protect against network faults using router redundancy and mesh connections that allow traffic to bypass failed network elements. SSO provides protection for network edge devices with dual Route Processors (RPs) that represent a single point of failure in the network design, and where an outage might result in loss of service for customers.

SSO has many benefits. Because the SSO feature maintains stateful feature information, user session information is maintained during a switchover, and line cards continue to forward network traffic with no loss of sessions, providing improved network availability. SSO provides a faster switchover than RPR by fully initializing and fully configuring the standby RP, and by synchronizing state information, which can reduce the time required for routing protocols to converge. Network stability may be improved with the reduction in the number of route flaps had been created when routers in the network failed and lost their routing tables.

Figure 6-1 illustrates how SSO is typically deployed in service provider networks. In this example, Cisco NSF with SSO is primarily at the access layer (edge) of the service provider network. A fault at this point could result in loss of service for enterprise customers requiring access to the service provider network.

For Cisco NSF protocols that require neighboring devices to participate in Cisco NSF, Cisco NSF-aware software images must be installed on those neighboring distribution layer devices. Additional network availability benefits might be achieved by applying Cisco NSF and SSO features at the core layer of your network; however, consult your network design engineers to evaluate your specific site requirements.

Additional levels of availability may be gained by deploying Cisco NSF with SSO at other points in the network where a single point of failure exists. Figure 6-2 illustrates an optional deployment strategy that applies Cisco NSF with SSO at the enterprise network access layer. In this example, each access point in the enterprise network represents another single point of failure in the network design. In the event of a switchover or a planned software upgrade, enterprise customer sessions would continue uninterrupted through the network.

Figure 6-2 Cisco NSF with SSO Network Deployment: Enterprise Networks

SSO Operation

SSO establishes one of the RPs as the active processor while the other RP is designated as the standby processor. SSO fully initializes the standby RP, and then synchronizes critical state information between the active and standby RP.

During an SSO switchover, the line cards are not reset, which provides faster switchover between the processors. The following events cause a switchover:

A hardware failure on the active supervisor engine

Clock synchronization failure between supervisor engines

A manual switchover or shutdown

An SSO switchover does not interrupt Layer 2 traffic. An SSO switchover preserves FIB and adjacency entries and can forward Layer 3 traffic after a switchover. SSO switchover duration is between 0 and 3 seconds.

Route Processor Synchronization

Synchronization Overview

In networking devices running SSO, both RPs must be running the same configuration so that the standby RP is always ready to assume control if the active RP fails. SSO synchronizes the configuration information from the active RP to the standby RP at startup and whenever changes to the active RP configuration occur. This synchronization occurs in two separate phases:

While the standby RP is booting, the configuration information is synchronized in bulk from the active RP to the standby RP.

When configuration or state changes occur, an incremental synchronization is conducted from the active RP to the standby RP.

Bulk Synchronization During Initialization

When a system with SSO is initialized, the active RP performs a chassis discovery (discovery of the number and type of line cards and fabric cards, if available, in the system) and parses the startup configuration file.

The active RP then synchronizes this data to the standby RP and instructs the standby RP to complete its initialization. This method ensures that both RPs contain the same configuration information.

Even though the standby RP is fully initialized, it interacts only with the active RP to receive incremental changes to the configuration files as they occur. Executing CLI commands on the standby RP is not supported.

Synchronization of Startup Configuration

During system startup, the startup configuration file is copied from the active RP to the standby RP. Any existing startup configuration file on the standby RP is overwritten.

The startup configuration is a text file stored in the NVRAM of the RP. It is synchronized whenever you perform the following operations:

CLI command copy system:running-config nvram:startup-config is used.

CLI command copy running-config startup-config is used.

CLI command write memory is used.

CLI command copy filename nvram:startup-config is used.

SNMP SET of MIB variable ccCopyEntry in CISCO_CONFIG_COPY MIB is used.

System configuration is saved using the reload command.

System configuration is saved following entry of a forced switchover CLI command.

Incremental Synchronization

Incremental Synchronization Overview

After both RPs are fully initialized, any further changes to the running configuration or active RP states are synchronized to the standby RP as they occur. Active RP states are updated as a result of processing feature information, external events (such as the interface becoming up or down), or user configuration commands (using CLI commands or Simple Network Management Protocol [SNMP]) or other internal events.

CLI Commands

CLI changes to the running configuration are synchronized from the active RP to the standby RP. In effect, the CLI command is run on both the active and the standby RP.

SNMP SET Commands

Configuration changes caused by an SNMP set operation are synchronized on a case-by-case basis. Currently only two SNMP configuration set operations are supported:

shut and no-shut (of an interface)

link up/down trap enable/disable

Routing and Forwarding Information

Routing and forwarding information is synchronized to the standby RP:

State changes for SSO-aware features (for example, SNMP) are synchronized to the standby RP.

Cisco Express Forwarding updates to the Forwarding Information Base (FIB) are synchronized to the standby RP.

Chassis State

Changes to the chassis state due to line card insertion or removal are synchronized to the standby RP.

Line Card State

Changes to the line card states are synchronized to the standby RP. Line card state information is initially obtained during bulk synchronization of the standby RP. Following bulk synchronization, line card events, such as whether the interface is up or down, received at the active processor are synchronized to the standby RP.

Counters and Statistics

The various counters and statistics maintained in the active RP are not synchronized because they may change often and because the degree of synchronization they require is substantial. The volume of information associated with statistics makes synchronizing them impractical.

Note Not synchronizing counters and statistics between RPs may create problems for external network management systems that monitor this information.

SSO Operation

SSO Conditions

An automatic or manual switchover may occur under the following conditions:

A fault condition that causes the active RP to crash or reboot—automatic switchover

The active RP is declared dead (not responding)—automatic switchover

The CLI is invoked—manual switchover

The user can force the switchover from the active RP to the standby RP by using a CLI command. This manual procedure allows for a “graceful” or controlled shutdown of the active RP and switchover to the standby RP. This graceful shutdown allows critical cleanup to occur.

Note This procedure should not be confused with the graceful shutdown procedure for routing protocols in core routers—they are separate mechanisms.

Caution The SSO feature introduces a number of new command and command changes, including commands to manually cause a switchover. The
reload command does not cause a switchover. The
reload command causes a full reload of the box, removing all table entries, resetting all line cards, and interrupting nonstop forwarding.

Switchover Time

The time required by the device to switch over from the active RP to the standby RP is between zero and three seconds.

Although the newly active processor takes over almost immediately following a switchover, the time required for the device to begin operating again in full redundancy (SSO) mode can be several minutes, depending on the platform. The length of time can be due to a number of factors including the time needed for the previously active processor to obtain crash information, load code and microcode, and synchronize configurations between processors.

On DFC-equipped switching modules, forwarding information is distributed, and packets forwarded from the same line card should have little to no forwarding delay; however, forwarding packets between line cards requires interaction with the RP, meaning that packet forwarding might have to wait for the switchover time.

Online Removal of the Active RP

Online removal of the active RP automatically forces a stateful switchover to the standby RP.

Fast Software Upgrade

You can use Fast Software Upgrade (FSU) to reduce planned downtime. With FSU, you can configure the system to switch over to a standby RP that is preloaded with an upgraded Cisco IOS software image. FSU reduces outage time during a software upgrade by transferring functions to the standby RP that has the upgraded Cisco IOS software preinstalled. You can also use FSU to downgrade a system to an older version of Cisco OS or have a backup system loaded for downgrading to a previous image immediately after an upgrade.

SSO must be configured on the networking device before performing FSU.

Note During the upgrade process, different images will be loaded on the RPs for a short period of time. During this time, the device will operate in RPR mode.

Core Dump Operation

In networking devices that support SSO, the newly active primary processor runs the core dump operation after the switchover has taken place. Not having to wait for dump operations effectively decreases the switchover time between processors.

Following the switchover, the newly active RP will wait for a period of time for the core dump to complete before attempting to reload the formerly active RP. The time period is configurable. For example, on some platforms an hour or more may be required for the formerly active RP to perform a coredump, and it might not be site policy to wait that much time before resetting and reloading the formerly active RP. In the event that the core dump does not complete within the time period provided, the standby is reset and reloaded regardless of whether it is still performing a core dump.

The core dump process adds the slot number to the core dump file to identify which processor generated the file content.

Note Core dumps are generally useful only to your technical support representative. The core dump file, which is a very large binary file, must be transferred using the TFTP, FTP, or remote copy protocol (rcp) server and subsequently interpreted by a Cisco Technical Assistance Center (TAC) representative that has access to source code and detailed memory maps.

SSO-Aware Features

A feature is SSO-aware if it maintains, either partially or completely, undisturbed operation through an RP switchover. State information for SSO-aware features is synchronized from active to standby to achieve stateful switchover for those features.

The dynamically created state of SSO-unaware features is lost on switchover and must be reinitialized and restarted on switchover.

Default Settings for SSO

None.

How to Configure SSO

Note See “Fast Software Upgrade,” for information about how to copy images onto the switch. During the upgrade process, different images will be loaded on the RPs for a very short period of time. If a switchover occurs during this time, the device will recover in RPR mode.

Either the SSO or RPR redundancy mode is always configured. The SSO redundancy mode is configured by default. To revert to the default SSO redundancy mode from the RPR redundancy mode, perform this task:

Command

Purpose

Step 1

Router> enable

Enables privileged EXEC mode (enter your password if prompted).

Step 2

Router# configure terminal

Enters global configuration mode.

Step 3

Router(config)# redundancy

Enters redundancy configuration mode.

Step 4

Router(config)# mode sso

Sets the redundancy configuration mode to SSO on both the active and standby RP.

Troubleshooting SSO

Possible SSO Problem Situations

The standby RP was reset, but there are no messages describing what happened—To display a log of SSO events and clues as to why a switchover or other event occurred, enter the show redundancy history command on the newly active RP:

Router# show redundancy history

The show redundancy states command shows an operating mode that is different than what is configured on the networking device—On certain platforms the output of the show redundancy states command displays the actual operating redundancy mode running on the device, and not the configured mode as set by the platform. The operating mode of the system can change depending on system events. For example, SSO requires that both RPs on the networking device be running the same software image; if the images are different, the device will not operate in SSO mode, regardless of its configuration.

For example, during the upgrade process different images will be loaded on the RPs for a short period of time. If a switchover occurs during this time, the device will recover in RPR mode.

Reloading the device disrupts SSO operation—The SSO feature introduces a number of commands, including commands to manually cause a switchover. The reload command is not an SSO command. This command causes a full reload of the box, removing all table entries, resetting all line cards, and thereby interrupting network traffic forwarding. To avoid reloading the box unintentionally, use the redundancy force-switchover command.

During a software upgrade, the networking device appears to be in a mode other than SSO—During the software upgrade process, the show redundancy command indicates that the device is running in a mode other than SSO.

This is normal behavior. Until the FSU procedure is complete, each RP will be running a different software version. While the RPs are running different software versions, the mode will change to either RPR. The device will change to SSO mode once the upgrade has completed.

The previously active processor is being reset and reloaded before the core dump completes—Use the crashdump-timeout command to set the maximum time that the newly active processor waits before resetting and reloading the previously active processor.

Issuing a “send break” does not cause a system switchover—This is normal operation. Using “send break” to break or pause the system is not recommended and may cause unpredictable results. To initiate a manual switchover, use the redundancy force-switchover command.

In Cisco IOS software, you can enter ROM monitor mode by restarting the switch and then pressing the Break key or issuing a “send break” command from a telnet session during the first 60 seconds of startup.The send break function can be useful for experienced users or for users under the direction of a Cisco Technical Assistance Center (TAC) representative to recover from certain system problems or to evaluate the cause of system problems.

SSO Troubleshooting

The following commands may be used as needed to troubleshoot the SSO feature. These commands do not have to be entered in any particular order.

Command

Purpose

Router(config-red)# crashdump-timeout [ mm | hh : mm ]

Sets the longest time that the newly active RP will wait before reloading the formerly active RP.