Monitoring and Maintaining Multicast HA Operations

Last Updated: November 28, 2012

This module describes IPv4 and IPv6 multicast high availability (HA) support and the concepts and tasks necessary to monitor and maintain multicast HA operations.

Multicast HA capabilities enable Cisco nonstop forwarding (NSF) with stateful switchover (SSO) support for IPv4 and IPv6 multicast, which--following a Route Processor (RP) switchover--reduces the reconvergence time of the multicast control plane to a level that is transparent to most multicast-based applications and In Service Software Upgrade (ISSU) support for Protocol Independent Multicast (PIM).

Finding Feature Information

Your software release may not support all the features documented in this module. For the latest feature information and caveats, see the release notes for your platform and software release. To find information about the features documented in this module, and to see a list of the releases in which each feature is supported, see the Feature Information Table at the end of this document.

Use Cisco Feature Navigator to find information about platform support and Cisco software image support. To access Cisco Feature Navigator, go to
www.cisco.com/go/cfn. An account on Cisco.com is not required.

Prerequisites for Monitoring and Maintaining HA Operations

This module assumes that your device is configured for IP multicast and is participating in an IP multicast network. For more information about configuring IP multicast using PIM sparse mode (PIM-SM), Source Specific Multicast (PIM-SSM), or bidirectional PIM (bidir-PIM), see the " Configuring a Basic IP Multicast Network " module.

SSO must be configured and working properly. If you do not have SSO enabled, see the " Stateful Switchover " module.

This module assumes that you are familiar with NSF concepts. For more information about NSF, see the " Cisco Nonstop Forwarding " module.

This module assumes that you are familiar with the ISSU process.

Restrictions for Monitoring and Maintaining HA Operations

Multicast IPv6 multicast SSO is supported only for PIM-SSM mode and PIM sparse mode using static RP configuration. SSO for bidir-PIM is not supported for IPv6 multicast.

Information About Monitoring and Maintaining Multicast HA Operations

Multicast HA Support Differences from Other Routing Protocols

Multicast HA support is different than HA support for other routing protocols because multicast routing (mroute) state is dynamic; that is, mroute state depends on the presence of sources and receivers. At the beginning of SSO, multicast state information known by downstream PIM neighbors is refreshed by the control plane. In addition, mroute state creation can be triggered by data driven events (DDEs) in the following cases:

Mroute state creation triggered on the first hop designated router (DR) as a result of active source traffic.

Shortest path tree (SPT) switchovers on the last hop DR; this occurs when traffic on the shared tree is detected on the last hop router.

Mroute states created in these data driven event cases are not learned from PIM join and prune messages from PIM neighbors.

Multicast Graceful Restart Overview

Multicast Graceful Restart (GR) is achieved with a combination of the NSF/SSO--IPv4 Multicast feature, the NSF/SSO--IPv6 feature, and the PIM Triggered Joins feature.

During normal operation (steady state), the software dynamically synchronizes information corresponding to events that modify the multicast forwarding state on the standby RP. Instead of performing periodic bulk synchronization updates, the software sends updates only for modified entities within internal databases. These updates are triggered by events that cause internal database changes related to the multicast forwarding state.

Note

This functionality applies only to the dynamic synchronization on the standby RP for updates to the multicast forwarding state that occur during steady state operation. Bulk synchronization updates, however, are required whenever a standby RP is inserted, reloaded, or reset.

In steady state, the following internal multicast forwarding databases are dynamically synchronized on the standby RP:

MCAC Reservation--Internal database that stores the identity of IPv6 (S, G) multicast routes for which a MCAC cost is currently accrued for each interface on the active RP (IPv6 only).

MFIB Interactions on the Active and Standby RPs Before an RP Switchover

Before an RP switchover, each Multicast Forwarding Information Base (MFIB) instance keeps a permanent of record of DDEs it generated that are passed through the Multicast Routing Information Base (MRIB) on the active RP to the MFIB on the standby RP.

The figure illustrates the multicast NSF/SSO interactions between the MFIB components on the active and standby RPs before a switchover.

Figure 1

MFIB Interactions on the Active and Standby RPs Before an RP Switchover

Unicast and Multicast NSF and SSO Hold-Off Period

Following an RP failure, data plane forwarding information is retained despite the fact that the new primary RP does not have a complete set of control plane information. The retention of this information enables forwarding to continue during unicast and multicast routing protocol reconvergence. While unicast and multicast routing protocol reconvergence is in progress, a hold-off period is observed during which no multicast forwarding updates are sent from the multicast routing protocol layer to the data plane layer. The hold-off period ends after unicast and multicast protocol convergence has completed.

Unicast routing protocol convergence begins before multicast protocol convergence. Multicast routing protocol (PIM) convergence does not begin until the multicast protocol layer receives explicit signaling that unicast routing protocol convergence has completed. Unicast protocols that are not SSO-aware are not covered by this signal and are not taken into account when waiting for convergence.

Note

Some SSO-aware routing protocols (for example, Border Gateway Protocol (BGP)) may generate logging messages indicating that the initial convergence has completed (based on an internal timer) before full convergence has occurred. PIM, however, does not provide any explicit indication of reconvergence.

The hold-off period may terminate before full convergence of unicast routing protocols, which will result in null RPF interfaces for any affected IP addresses. As additional unicast routing updates are received, the affected multicast routes are updated as needed. This is expected and acceptable behavior for SSO-aware routing protocols that are slower in converging.

Note

An RP switchover occurring on a system operating with unicast protocols that are not SSO-aware will cause undesirably long convergence times--but no routing loops--for multicast routes.

At the end of the hold-off period, the multicast data plane layer marks any existing data plane information as stale. That information is subsequently flushed if it is not refreshed through the downloading of the current reconverged control plane information.

MFIB Interactions During an RP Switchover

During an RP switchover, while the routing protocols are reconverging, no changes to the multicast tables will occur. All MFIB instances will enter NSF mode, as illustrated in the figure.

Figure 2

MFIB Interactions During an RP Switchover

Unicast and Multicast NSF and SSO Events That Occur Following an RP Switchover

In the event of an RP switchover, even with the continuous synchronization of unicast and multicast routing information from the primary to the standby RP, it is not possible to guarantee that the information most recently updated on the primary RP can be synchronized to the standby RP before a failure occurs on the primary RP. For this reason, following an RP switchover, both unicast and multicast routing protocols trigger the retransmission of routing information from neighboring routers to ensure that the unicast and multicast routing information is current.

For multicast protocol retransmission, the software triggers a refresh of all multicast routing information available from PIM neighbors using the PIM GenID capability described in RFC 4601. GenID support enables fast mroute reconvergence after a switchover. A GenID is a randomly generated 32-bit value regenerated each time PIM forwarding is started or restarted on an interface. In the event of a switchover, the GenID value is used as a mechanism to trigger adjacent PIM neighbors on an interface to send PIM join messages for all (*, G) and (S, G) mroutes that use that interface as an RPF interface, immediately reestablishing those states on the new primary RP. Internet Group Management Protocol (IGMP) for IPv4 multicast and Multicast Listener Discovery (MLD) group membership information for IPv6 multicast is restored by executing IGMP/MLD queries on all IGMP/MLD interfaces.

The following multicast NSF/SSO events occur in parallel following an RP switchover:

The software empties the queue containing unprocessed synchronization messages for multicast sent by the previous primary RP and starts a
unicast IGP convergence fail-safe timer to handle the possibility that unicast Interior Gateway Protocol (IGP) convergence never completes.

As interfaces come up on the new primary RP, unicast routing protocol reconvergence processing proceeds.

As each PIM-enabled interface comes up, PIM hello messages are sent out using a new GenID value for the interface. The modified GenID value triggers PIM join and prune messages from all adjacent PIM neighbors on the network to which the interface is attached. As these messages are received, information about mroute states that were missing on the new primary RP are restored except for last hop SPT (S, G) routes and mroutes associated with directly connected hosts with no other intermediate routers. Because this routing information begins to arrive before unicast IGP convergence has occurred, mroutes may initially have NULL RPF ingress interfaces. As this state information is learned, the multicast protocol layer sends the corresponding update messages to the MRIB.

IGMP/MLD group membership information is restored by the execution of IGMP/MLD queries on all IGMP/MLD interfaces.

Following IGMP/MLD reporting, the control plane then sends out requests for the MFIB replay of DDEs to retrigger multicast route information that cannot be obtained from PIM neighbors or directly connected hosts.

After DDE replay, the hold-off period ends. At the end of the hold-off period, the multicast data plane layer marks any existing data plane information as stale and that information is subsequently flushed if it is not refreshed via the downloading of the current reconverged control plane information.

PIM and MFIB Interactions Following an RP Switchover to Replay DDEs

The underlying components that make up the MFIB infrastructure coordinate to ensure successful multicast NSF/SSO operations. In particular, the internal exchange of instructions between PIM and the MFIB, as illustrated in the figure, ensure error-free operation and the successful replay of DDEs.

Figure 3

PIM and MFIB Interactions Following an RP Switchover

Operation After the RP Switchover

The new RP (the previous active RP that went down) will work as the standby RP after the repair, reboot, or reinstallation, as shown in the figure.

Figure 4

PIM and MFIB Interactions Following an RP Switchover

ISSU Support for IP Multicast

The ISSU process allows software to be updated or otherwise modified while packet forwarding continues. In most networks, planned software upgrades are a significant cause of downtime. ISSU allows software to be modified while packet forwarding continues, which increases network availability and reduces downtime caused by planned software upgrades.

To provide the required ISSU and SSO support necessary for IP multicast, a PIM ISSU client is introduced. The PIM ISSU client resides on both the primary and the standby RPs and enables PIM synchronization message transmission between two RPs using different versions of software. The PIM ISSU client performs transformation of PIM dynamic state synchronization messages sent from or received by the RP having the most recent software version. If synchronization messages are sent to a RP not using the most recent software version, the messages are translated to the older format used by this RP. If messages are received from this RP, the messages are translated to the newer format used by the receiving RP before being passed to the PIM HA software for processing.

This command logs events that are important in verifying the operation of NSF/SSO operation for IP multicast. The classes of events logged by debugipmulticastredundancy command include SSO events during an RP switchover and dynamic synchronization events that occur during steady state operation.

Use the optional verbose keyword to log events that may occur frequently during normal operation, but that may be useful for tracking in short intervals.

The following is output from the debugipmulticastredundancy command. The output displays the logging message that is displayed when the standby RP is recovered after a standby RP transition:

Use this command to display the PIM neighbors discovered by PIMv1 router query messages or PIMv2 hello messages that support the GenID capability.

The output of the showippimneighbor command displays the "G" flag to indicate GenID support status for each PIM neighbor. The "G" flag is displayed only if the neighbor supports the GenID capabilities provided by PIM.

GenID support enables fast mroute reconvergence after a switchover. A GenID is a randomly generated 32-bit value regenerated each time PIM forwarding is started or restarted on an interface. In the event of a switchover, the GenID value is used as a mechanism to trigger adjacent PIM neighbors on an interface to send PIM join messages for all (*, G) and (S, G) mroutes that use that interface as an RPF interface, immediately reestablishing those states on the newly active RP.

A summary statistic showing the current number of synchronization messages awaiting transmission from the active RP to the standby RP. (This count is summed across all synchronization database types.)

A summary statistic showing the current number of synchronization messages that have been sent from the active RP to the standby RP, but for which the active RP has not yet received acknowledgment from the standby for successful reception. (This count is summed across all synchronization database types.)

The last two statistics, displaying the count of messages awaiting transmission or acknowledgment, provide a way to measure the load on the internal synchronization message-sending mechanism.

Perform this optional task to configure an additional timeout period before stale forwarding plane mroute information is flushed. This timeout period is added on to the default NSF route flush time as a delay between the downloading of refreshed multicast control plane route information to the forwarding plane and the flushing of "stale" NSF forwarding plane information retained from SSO before the RP switchover.

Caution

It is not recommended that you configure this additional delay unless it is specifically required for your topology because it could increase the risk of routing loops during NSF.

Note

You would need to perform this task only if you have a routing protocol that requires additional time to populate routing information after the signaling of unicast routing convergence (for example, BGP in a configuration with a large number of VPN routing and forwarding (VRF) instances). The need to configure this timeout period may be determined during predeployment SSO stress testing.

SUMMARY STEPS

1.enable

2.configureterminal

3.ipmulticastredundancyrouteflushmaxtimeseconds

4.end

5.
show ip multicast redundancy state

DETAILED STEPS

Command or Action

Purpose

Step 1

enable

Example:

Router> enable

Enables privileged EXEC mode.

Enter your password if prompted.

Step 2

configureterminal

Example:

Router# configure terminal

Enters global configuration mode.

Step 3

ipmulticastredundancyrouteflushmaxtimeseconds

Example:

Router(config)# ip multicast redundancy routeflush maxtime 900

Configures an additional timeout period before stale forwarding plane mroute information is flushed following an RP switchover.

The range is from 0 to 3600 seconds. The default is 30 seconds.

Step 4

end

Example:

Router(config)# end

Ends the current configuration session and returns to privileged EXEC mode.

Step 5

show ip multicast redundancy state

Example:

Router# show ip multicast redundancy state

Displays the current redundancy state for IP multicast.

Use this command to confirm the stale NSF state flush timeout period being used. The "Stale NSF state flush timeout" output field will display the timeout period setting.

The following example shows how to monitor IP multicast NSF/SSO events during an RP switchover using the debugipmulticastredundancy command. The example shows IP multicast events occurring as a standby RP assumes the role of active RP during an SSO switchover. The events labeled "MCAST-HA" are logged by the IP multicast SSO debug facility.

The following output is from the debugipmulticastredundancy command. As interfaces come up on the new active RP, unicast convergence occurs in parallel with multicast route refresh from PIM neighbors. Unicast convergence is followed by RPF adjustments to the refreshed mroute information.

IGMP Queries DDE Replay Termination of the NSF Hold-Off Period and Flushing of Stale Forwarding Information

The following output is from the debugipmulticastredundancy command. After the processing of unicast and multicast route convergence, time is allowed for IGMP reporting. Following IGMP reporting, the control plane then sends out requests for the MFIB replay of DDEs to retrigger multicast route information that cannot be obtained from PIM neighbors or directly connected hosts. After this processing completes, the control plane waits for the NSF hold-off time period to terminate. The refreshed multicast control plane information is then downloaded to the forwarding plane and when this is completed, the stale multicast forwarding plane information is subsequently flushed.

Standby RP Bringup

The following is sample output from the debugipmulticastredundancy command. This output shows events related to the reloading of the standby RP; in particular, events related to ISSU negotiation between the active and standby RP and events related to the synchronization of dynamic multicast forwarding information from the active RP to the standby RP. Synchronization events are also logged in steady state for events that occur that affect dynamic group-to-RP mapping information or dynamic tunnel state.

Example Monitoring the Transition from Standby RP to Active RP Following a Switchover

The following example shows how to monitor the transition from standby RP to active RP and confirm the IP multicast redundancy state and the status on the standby RP after it has resynchronized with the new active RP.

Note

In this example scenario, a router is configured for IPv4 multicast routing operation, but not for IPv6 multicast. As a result, some of the output fields that are specific to IPv6 multicast will indicate status such as "Not enabled" or "Idle" in the example outputs.

Initial State on Standby RP Before Switchover

The following output is from the showipmulticastredundancystate command on a standby RP before an active RP goes down. In the sample output, notice that the "Current sync state" field displays "Not synching," indicating that the standby RP is not synchronizing data to the active RP. The standby RP serves only as a passive recipient of synchronization updates and does not initiate synchronization updates to the active RP.

The following output is unconditionally logged by the Redundancy Facility (RF) software when the standby RP detects that it has become the active RP due to a failure of the original active RP. The output shows the message used to indicate that an RP switchover has occurred:

Standby RP Transition to Active RP After an RP Switchover

The following output is from the showipmulticastredundancystatecommand on the standby RP during its transition from standby RP to active RP. Notice that the "Multicast IPv4 HA state machine status" field displays "Unicast converging," indicating that unicast convergence on the new active RP has begun. At this point in the RP switchover, the standby RP is waiting for unicast convergence.

The following output from the debugipmulticastredundancystate command shows messages indicating that the interfaces on the new active RP are coming up. As interfaces come up on the new active RP, unicast convergence occurs in parallel with multicast route refresh from PIM neighbors. Unicast convergence is followed by RPF adjustments to the refreshed mroute information.

00:00:51: %LINK-3-UPDOWN: Interface Null0, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Loopback0, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Loopback1, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel0, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel1, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel2, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel3, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel4, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel5, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel6, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel7, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel8, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel9, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel10, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel11, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel12, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel13, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel14, changed state to up
00:00:51: %LINK-3-UPDOWN: Interface Tunnel15, changed state to up
00:00:51: %LINK-5-CHANGED: Interface GigabitEthernet0/0, changed state to administratively down
00:00:51: %LINK-5-CHANGED: Interface GigabitEthernet0/1, changed state to administratively down
00:00:51: %LINK-5-CHANGED: Interface GigabitEthernet0/2, changed state to administratively down
00:00:51: %LINK-5-CHANGED: Interface GigabitEthernet0/3, changed state to administratively down
00:00:51: %LINK-5-CHANGED: Interface GigabitEthernet1/0, changed state to administratively down
00:00:51: %LINK-5-CHANGED: Interface GigabitEthernet1/1, changed state to administratively down
00:00:51: %LINK-5-CHANGED: Interface GigabitEthernet1/2, changed state to administratively down
00:00:51: %LINK-5-CHANGED: Interface GigabitEthernet1/3, changed state to administratively down
00:00:51: %LINK-5-CHANGED: Interface Serial2/0, changed state to administratively down
00:00:51: %LINK-5-CHANGED: Interface Serial2/1, changed state to administratively down

The following is output from the showipmulticastredundancystate command during the transition from the standby RP to the new active RP. Notice that the "Multicast IPv4 HA state machine status" displays "DDE replaying," indicating that the MFIB is replaying DDEs. After the processing of unicast and multicast route convergence, time is allowed for IGMP reporting. Following IGMP reporting, the control plane then sends out requests for the MFIB replay of DDEs to retrigger multicast route information that cannot be obtained from PIM neighbors or directly connected hosts.

After this processing completes, the control plane terminates the NSF hold-off or, if the platform multicast driver software requests an extension to the hold-off period, allows additional time for the platform multicast driver software to release the NSF hold-off extension.

The refreshed multicast control plane information is then downloaded to the forwarding plane. Although reconvergence is considered complete at this point, additional "refresh" updates may occur after this point in time. An additional time interval is provided for any remaining updates before stale multicast forwarding plane information is subsequently flushed.

The following is output from theshowipmulticastredundancystate command. Notice that the "Multicast IPv4 HA state machine status" field displays, "Flush pending," indicating that stale NSF data plane state is still being temporarily retained to allow for any additional refreshed multicast control plane information to be downloaded to the forwarding plane.

The following is output from the showipmulticastredundancystate command after the refreshed multicast control plane information has been downloaded to the forwarding plane and the stale multicast forwarding plane information has been flushed. Notice that at this stage in the RP switchover the "Multicast IPv4 HA state machine status" field displays "Idle" because multicast IPv4 HA state machine operations have completed.

The following is output from the showipmulticastredundancystatecommand after the standby RP has completed resynchronization with the new active RP. Notice that the "Multicast IPv4 Redundancy Mode" field displays "SSO," indicating that all information between the standby RP and active RP has been synchronized. Also, notice that the "Current sync state" field displays "Synched," indicating that the standby has resynchronized with the new active RP.

Technical Assistance

Description

Link

The Cisco Support and Documentation website provides online resources to download documentation, software, and tools. Use these resources to install and configure the software and to troubleshoot and resolve technical issues with Cisco products and technologies. Access to most tools on the Cisco Support and Documentation website requires a Cisco.com user ID and password.

The following table provides release information about the feature or features described in this module. This table lists only the software release that introduced support for a given feature in a given software release train. Unless noted otherwise, subsequent releases of that software release train also support that feature.

Use Cisco Feature Navigator to find information about platform support and Cisco software image support. To access Cisco Feature Navigator, go to
www.cisco.com/go/cfn. An account on Cisco.com is not required.

This feature extends NSF/SSO functionality to IPv4 Multicast protocols. Multicast NSF ensures uninterrupted flow of multicast traffic during an RP failure. Multicast SSO ensures that necessary information such as RP information, data driven events, and other multicast information is checkpointed to ensure the seamless takeover of the standby RP after an RP failover.

The following commands were introduced or modified:
clearipmulticastredundancystatistics,
debugipmulticastredundancy,ipmulticastredundancyrouteflushmaxtime,showipmulticastredundancystate,showipmulticastredundancystatistics,
showippimneighbor.

Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries. To view a list of Cisco trademarks, go to this URL:
www.cisco.com/go/trademarks. Third-party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (1110R)

Any Internet Protocol (IP) addresses and phone numbers used in this document are not intended to be actual addresses and phone numbers. Any examples, command display output, network topology diagrams, and other figures included in the document are shown for illustrative purposes only. Any use of actual IP addresses or phone numbers in illustrative content is unintentional and coincidental.