G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs

G06F9/44—Arrangements for executing specific programs

G06F9/4401—Bootstrapping

Abstract

Disclosed is a technique for facilitating software upgrade for a switching system comprising a first management processor and a second management processor and a set of one or more line processors, the techniques comprising receiving a signal to perform a software upgrade for a line processor from the set of line processors, and performing a software upgrade for the line processor without substantially affecting packet switching performed by the switching system.

Description

CROSS-REFERENCES TO RELATED APPLICATIONS

This application claims priority from U.S. Provisional Application No. 60/754,932 filed Dec. 28, 2005, and, U.S. Provisional Application No. 60/762,283, filed Jan. 25, 2006, both of which are incorporated by reference in their entirety for all purposes.

This application is related to U.S. Provisional Application No. 60/754,739 filed Dec. 28, 2005 and to co-pending U.S. application Ser. No. 11/586,991 filed Oct. 25, 2006, both of which are incorporated by reference in their entirety for all purposes.

BACKGROUND OF THE INVENTION

The present invention relates generally to computer networks and specifically to failover or redundancy in network equipment.

The OSI model developed by the International Organization for Standards (ISO) serves a guideline for developing standards for data communication. Different pieces of network equipment are governed by these standards and allows for the interconnection of various network equipment.

The OSI, or Open System Interconnection, model defines a networking framework for implementing protocols in seven layers. Control is passed from one layer to the next, starting at the application layer in one station, proceeding to the bottom layer, over the channel to the next station and back up the hierarchy. The seven layers (L1 to L7) are briefly summarized as follows:

a. Application (Layer 7)—This layer supports application and end-user processes. Communication partners are identified, quality of service is identified, user authentication and privacy are considered, and any constraints on data syntax are identified. This layer provides application services for file transfers, e-mail, and other network software services.

b. Presentation (Layer 6)—This layer provides independence from differences in data representation (e.g., encryption) by translating from application to network format, and vice versa. The presentation layer works to transform data into the form that the application layer can accept.

c. Session (Layer 5)—This layer establishes, manages, and terminates connections between applications. The session layer sets up, coordinates, and terminates conversations, exchanges, and dialogues between the applications at each end.

d. Transport (Layer 4)—This layer provides transparent transfer of data between end systems, or hosts, and is responsible for end-to-end error recovery and flow control. It ensures complete data transfer.

e. Network (Layer 3)—This layer provides switching and routing technologies, creating logical paths, known as virtual circuits, for transmitting data from node to node. Routing and forwarding are functions of this layer, as well as addressing, internetworking, error handling, congestion control and packet sequencing.

f. Data Link (Layer 2)—At this layer, data packets are encoded and decoded into bits. It furnishes transmission protocol knowledge and management and handles errors in the physical layer, flow control and frame synchronization. The data link layer is divided into two sublayers: The Media Access Control (MAC) layer and the Logical Link Control (LLC) layer. The MAC sublayer controls how a computer on the network gains access to the data and permission to transmit it. The LLC layer controls frame synchronization, flow control and error checking.

g. Physical (Layer 1)—This layer conveys the bit stream—electrical impulse, light or radio signal—through the network at the electrical and mechanical level. It provides the hardware means of sending and receiving data on a carrier, including defining cables, cards and physical aspects. Fast Ethernet, RS232, and ATM are protocols with physical layer components.

Network data switching equipment, such as that equipment used for switching or routing of information packets between network devices, handle data at the lower layers of the OSI model, while application level programs handle data at the higher OSI layers. It is desirable for network switching equipment to remain in operational condition for continuous periods of time.

A common administrative activity is the installation of new software or software modules. In most installations, it is desirable to avoid or at least minimize the impact of bringing down the system for such tasks. Whereas halting a higher level application for a software upgrade activity typically affects only the user or users of the application, halting operation of network switching equipment can impact a larger community of users, indeed possibly the entire enterprise. Software updates to applications are relatively easy to do because only a relatively limited number of users are usually affected. By comparison, the downtime that may result from performing software updates to network switching equipment could affect an entire enterprise.

BRIEF SUMMARY OF THE INVENTION

Embodiments of the present invention provide techniques for facilitating software upgrade for a system such as a switching system or router.

According to an embodiment of the present invention, techniques are provided for facilitating software upgrade for a switching system in a hitless manner.

According to an embodiment of the present invention, techniques are provided for facilitating software upgrade for a switching system comprising a first management processor and a second management processor, the techniques comprising receiving a signal to perform a software upgrade for the first management processor, and performing a software upgrade for the first management processor without substantially affecting packet switching performed by the switching system.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a high level block diagram of a network device according to an embodiment of the present invention;

FIG. 2 shows further detail of a linecard module;

FIG. 3A illustrates sync processing of linecard state information according to the present invention;

FIG. 3B illustrates switchover processing according to the present invention;

FIG. 4 illustrates line card processing during switchover processing;

FIG. 5 is a schematic representation of the software relation between the active MP and a standby MP;

FIG. 6 illustrate sync processing using a “dirty bit”;

FIG. 7 illustrates CLI synchronization;

FIG. 8 illustrated forwarding identifier synchronization;

FIG. 9 illustrates MAC synchronization; and

FIG. 10 illustrates protocol synchronization according to the present invention using the STP protocol as an example.

FIGS. 11A and 11B illustrate high level flows for software upgrades in the MP according to the present invention.

FIG. 12 illustrates a high level flow for software upgrades in the LP according to the present invention.

FIG. 13 illustrates various events in the LP during a software upgrades according to the present invention.

FIG. 14 shows a memory map in accordance with an illustrative embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

In the descriptions to follow, specific details for the purposes of explanation are set forth in order to provide a thorough understanding of the invention. However, it will be apparent that the invention may be practiced without these specific details. For example, the embodiment described below makes reference to the BigIron® Series product from Foundry Networks, Inc. However, this is not intended to limit the scope of the present invention. The teachings of the present invention are also applicable to other boxes, devices, routers, switching systems, and data processing systems.

FIG. 1 shows a high level block diagram of a network switching device 100 according to the present invention. In this embodiment, the network switching device 100 provides processing of data packets in Layer 1, Layer 2, and Layer 3 of the OSI model.

The network switching device 100 includes a processor 102a designated as the active management processor (active MP), or active management module. It will be appreciated that the processor 102a includes functional components such as a data processing unit, various memory components, control logic, driver circuits for interfacing with other elements of the network switching device 100, and interface circuits for remote access (e.g., by an administrator).

A user, typically an authorized system administrator, can interact with the network switching device 100 via the active management module 102a. The user can configure the other components of the network switching device 100, or otherwise inspect various data structures and machine states of the device, via the active management module 102a. The user can also perform software updates operations on the various components described below.

The active management module 102a is configured to provide a suitable user interface (e.g., command line interface, CLI) that allows the user to interact with the management module. The user can log onto the active management module 102a by way of a terminal that is connected to a maintenance port on the active management module, or some similar kind of port. Alternatively, the active management module 102a may be configured to provide the user with telnet access. It will be appreciated that the “user” can be a machine user; for example, a higher level management machine might be provided to management a large pool of network switching devices 100. In such a configuration, a suitable machine interface can be provided.

FIG. 1 shows that the network switching device 100 includes one or more additional processors 102b designated as standby management processors (standby MP), or standby management modules. It is understood that the network switching device may provide a single standby management module, or a plurality of such modules as illustrated in FIG. 1. As will be explained in greater detail below, any one of the standby management module(s) 102b can take over the operation of the active management module 102a if the latter becomes inactive for some reason. The management modules may be collectively referred to herein by the reference numeral 102 for convenience, and will be understood to refer to the active management module 102a and the standby management module(s) 102b.

FIG. 1 shows a plurality of linecard modules 108a-108d which may be collectively referred to herein by the reference numeral 108. Although the figure shows four such line cards, it is understood the fewer or more line cards can be provided, depending on the particular configuration of a given network switching device. Each linecard module 108a-108d is configured to be connected to one or more devices (not shown) to receive incoming data packets from devices connected to it and to deliver outgoing data packets to another device connected to the linecard module or connected to another linecard module.

A connection plane referred to herein as control plane crosspoint 104 serves to interconnect the linecard modules 108 via their respective control lines, identified in the figure generally by the reference numeral 124. Control lines 124 from the management modules 102 also connect to the control plane 104. The control plane 104 routes control traffic among the linecard modules 108 and the management modules 102 in order to maintain proper route destinations among the linecard modules, ensure network convergence, and so on. In accordance with the present invention, both the active management module 102a and the standby management module(s) 102b receive control traffic.

A connection plane referred to herein as data plane crosspoint 106 serves to interconnect the linecard modules 108 via their respective data paths, identified in the figure generally by the reference numeral 126. The data plane 106 allows data packet traffic received from a source connected to one linecard module to be routed to destination that is connected to another linecard module. Of course, the source of the data traffic and the destination may be connected to the same linecard module.

Refer now to FIG. 2 for a discussion of additional details of the linecard module 108a shown in FIG. 1. Elements common to FIGS. 1 and 2 are identified by the same reference numerals. The linecard module 108a includes some number of packet processors 202a-202d, collectively referred to herein by the reference numeral 202. Each packet processor is configured with a physical port connector, schematically represented in FIG. 2 by the double arrows, for incoming and outgoing data packets. The physical port connector provides connectivity to an end station (e.g., a personal computer) and to another piece of network equipment (e.g., router, bridge, etc.). The physical port connector can be a single connector (full duplex communication) or a pair of connectors (input and output).

The linecard module 108a further includes a data adapter 206 that is configured for connection to the packet processors 202 via data paths between the packet processors and the data adapter. The data adapter 206 is further configured for connection to the data plane 106. In this way, the linecard modules 108 can be connected to each other via corresponding data paths 126 to provide interconnectivity among the linecard modules. For example, in a typical implementation, a chassis can house some number of linecard modules. The chassis includes a backplane which has a plurality of connectors into which the linecard modules can be plugged. In this configuration, the data paths 126 would include the connection between data pins on the adapter 206 of one of the linecard modules and one of the backplane connectors.

The linecard module 108a further includes a linecard processor (LP) 204. The linecard processor 204 is connected to the packet processors 202 via corresponding control lines. The linecard processor 204 is also connected to the control plane 104 via the control line 124. The other linecard modules 108b-108d are similarly connected to the control plane 104. This allows for control/status information that is generated by the linecard modules 108 to be transmitted to the active management module 102a. Conversely, the active management module 102a can communicate control/status information it receives from the linecard modules 108 to any one or more of the other linecard modules.

Various processes and tasks execute as programs on the linecard processor 204 and in the packet processors 202, some of which will be discussed below. These processors comprise components such a data processor or microcontroller, memory (RAM, ROM), data storage devices, and suitable support logic in order store, load, and execute these programs.

In accordance with the present invention, the standby management module(s) 102b are connected to the control plane 104. This connection allows for the active management module 102a to transmit suitable information to the standby management module(s) 102b, and for the linecard modules to transmit suitable information to the standby management module(s) during operation of the system.

Following is a discussion of operation of the management processors 102 and linecard processors 204 in accordance with the present invention when software upgrades are made. Software upgrade processing is incorporated at system start up. Upgrade processing functionality includes similar functionality as described for failover switchover sequencing disclosed in the commonly owned application identified above (“hitless failover management”).

The acronym MP (management processor) will be used to refer to the management modules 102 shown in FIGS. 1 and 2. To avoid awkward sentence structures, the following discussion will assume a system having one standby management module 102b, with the understanding that the same operations described for the one standby management module would be performed by each such module in a configuration having multiple standby management modules. The term “active MP” will be understood to refer to the active management processor 102a, and “standby MP” will be understood to refer to the standby management processor 102b. Similarly, the acronym LP (linecard processor) will be used to refer to one or more linecard processors, depending on context.

First, a brief discussion of the start up sequence in the MPs will be given. This is followed by a brief description of switchover processing when the active MP experiences a failure. This will provide a backdrop against which to discuss aspects of the present invention.

A. Start-Up Sequence

1. Arbitration

When the network switching device 100 boots up (i.e., at power up, after a reset, etc.), MP active/standby arbitration is first performed in the monitor. Both processors begin executing, one processor becomes the active MP, the other processor becomes the standby MP. The process of MP active/standby arbitration is the method by which this determination is made. Typically, this is implemented using known hardware semaphore techniques, where each processor attempts to access the hardware semaphore. Only one processor will succeed; that processor designates itself as the active MP while the other processor designates itself as the standby MP. Alternatively, each processor can be configured by the user to be the active MP or a standby MP.

After MP active/standby arbitration is completed, the standby MP synchronizes its flash memory and boot images to the active MP (discussed below). After the synchronization, the standby MP loads the appropriate application image based on an instruction from the active MP (it may reset itself first if a new monitor or boot image is synced).

When the standby MP is ready to start its applications, MP active/standby arbitration is performed again. This second arbitration is performed to guard against the situation where the active MP resets or is removed after the first arbitration, but before the standby MP has a chance to install its interrupt service routine. In such a situation where the active MP was not available, the standby MP would then become the active MP. If the standby MP remains standby after this second arbitration, it then installs its ISR (interrupt service routine) and functions as a standby MP until an MP switchover interrupt occurs.

2. Active MP Operation

The active MP performs flash and boot image synchronization through a task executing on the active MP referred to as scp_task. The active MP reads in a startup configuration file and parses through the file. The configuration file can be stored in some form of programmable non-volatile memory, or on a disk drive. The active MP then synchronizes the standby MP with the configuration information (the running configuration), so that both the active MP and the standby MP have the same running configuration; i.e., the active MP sends over to the standby MP the running configuration. The scp_task will not reply to the standby MP until the startup configuration file is parsed by the active MP.

While the process of synchronizing the running configuration is in progress, CLI (command line interface), WEB, and SNMP inputs on the active MP are disabled until the standby MP is ready. This is to prevent the configuration state of the active MP from getting too far ahead of the standby MP due to configuration update information that can be received from users or linecard modules during the synchronization activity. When the running configuration is synchronized to the standby MP (i.e., the standby MP has the same configuration as the active MP), then the CLI, WEB, and SNMP interfaces can be executed on the active MP. Alternatively, it might be possible to allow these interfaces to run on the active MP, but simply queue up any input to be subsequently synchronized to the standby MP.

3. Standby MP Operation

After two arbitrations, the standby MP first starts a timer task called timer_task and a listener task called mp_rx_task. During normal startup processing standby MP then starts a task called redundancy_task, which performs a First Phase Software States synchronization operation. The First Phase Software States are the software states that are to be synchronized with the active MP before any other tasks are started. These states constitute the baseline software state in the standby MP. They include, for example, the running configuration, the CLI session modes, and in general the software states of any tasks that will execute on the standby MP.

After this baseline synchronization completes, the standby MP starts all other tasks. The synchronized running configuration (i.e., the running configuration sent over from the active MP) will be parsed when a task called the console_task is started. The standby MP will not initialize any of the hardware such as the linecard modules 108, and during operation of the system will drop all outgoing IPC packets it receives.

The scp task that runs on the standby MP views all slots as empty. This prevents the standby MP from running the card state machine and the port state machine. On the active MP, these state machines serve to keep track of the status of the linecard modules and the ports on the linecard modules during operation of system. Consequently, state changes in the linecard modules 108 and state changes in the ports of the linecard modules will not trigger an update by the standby MP, whereas updates will occur in the active MP since the card and port state machines do execute in the active MP.

However, in order to support upper layer components that may require the correct view of card and/or port states, the card and port states are synchronized between active and standby MPs. Thus, with reference to FIG. 3A, when a linecard module 108 experiences a change in its state or a change in state in one of its ports (step 312), the linecard module informs the active MP (step 314) of the state change. The active MP updates its internal data structures (i.e., the card or port state machine) to reflect the state change in the linecard, step 316. The active MP also syncs that state information to the standby MP (i.e., sends the state information to the standby MP), step 318. Thus, even though the standby MP is not actively communicating with the linecard modules, it is kept apprised of their state changes via the active MP.

B. Switchover Behavior

When the active MP resets (e.g., because it crashed, the processor locked up and a watchdog timer reset the processor, a user initiated the reset, etc.) an MP switchover happens and the standby MP will become the “newly active MP” upon completion of the MP switchover process.

Referring to FIG. 3B, the switchover process begins when the active MP resets and generates an interrupt, step 302. The interrupt is a common example of a mechanism that allows the standby MP to detect the occurrence of a reset in the active MP. It will be appreciated that other detection mechanisms can be employed. The ISR of the standby MP is entered and executed as a result of receiving the interrupt, step 304. The ISR sends an ITC (inter task communication) message to the standby MP's redundancy_task, step 306. The redundancy_task in turn performs the necessary hardware access to send out an “MP Switchover” event before the task exits.

The scp_task in the standby MP registers for the “MP Switchover” event, and upon detecting the event, runs the card state machine to perform hot-switchover, step 308. In the case that the standby MP is synchronized with the active MP before the latter crashed, this action will have no effect on the linecard modules 108. However, in the event that the active MP crashed before it had a chance to synchronize the current card states and port states to the standby MP, then the MP's would be out of sync with respect to the linecard module states at the time the active MP reset. Running the card state machine in this case guarantees that the linecard modules 108 are in a state that the newly active MP thinks they should be in.

1. Card and Port State Machines During MP Switchover

As mentioned before, the standby MP views all slots empty. This prevents card and port state machines from running. When the scp_task receives the “MP Switchover” event, it changes all non-empty and non-powered-off slots to a “Recovery” state. FIG. 4 illustrates the processing that takes place in the linecard modules 108. It is noted that the following steps can be performed by suitably configured program code executing in the linecard processor 204 of the linecard module receiving the information.

In the “Recovery” state, the standby MP sends card configurations to each the linecard modules 108 with an indication that it is due to MP switchover, step 402. If a given linecard module is in an “Up” state, it compares the received configurations against its cached configurations, steps 404, 406. If anything is missing from its cached configurations, it re-applies it, step 408. The linecard module then sends its card operational information to the standby MP, step 410. If the linecard module is not in the “Up” state, it resets itself, step 412. The transition from the “Recovery” state to the “Up” state bypasses switch fabric programming, and thus incurs no traffic hits. When a linecard module reaches the “Up” state, its ports kick off the port state machine.

Some upper layer components may want to run on the standby MP in exactly the same way as they did on the active MP. In this case, these upper layer components may require the correct view of cards and ports, as well as card and port up/down events. To facilitate this requirement, the up/down state of the linecard modules and their respective ports are synchronized to the standby MP, and a new set of card and port up/down events are provided. The set of events which the upper layer components can register for include:

EVENT_ID_MP_RED_CARD_UP = 23,

EVENT_ID_MP_RED_CARD_DOWN = 24,

EVENT_ID_MP_RED_PORT_UP = 25,

EVENT_ID_MP_RED_PORT_DOWN = 26,

EVENT_ID_MP_RED_PORT_DOWN_COMPLETE = 27,

EVENT_ID_MP_RED_PORTS_UP = 28,

EVENT_ID_MP_RED_PORTS_DOWN = 29,

EVENT_ID_MP_RED_PORTS_DOWN_COMPLETE

2. Upper Layer Components Can Register for These Events.

After MP switchover completes, the linecard modules 108 report their card and port states through the “Recovery” state machine mechanism discussed above. These states may potentially conflict with those synchronized before the switchover. The consequences could be the missing of “Down” events (by default, all port states are down). To avoid this, after “Recovery” state machine completes, and when all ports have reached their final states, a new event “Port State Ready” is sent. Thus, referring to FIG. 4, step 410 would include sending the “Port State Ready” event. All upper layer components registering for the new set of card/port up/down events can also register for this one. Upon receiving a registered event, the upper layer component can check the card and port states using suitable HAL APIs (hardware abstraction layer, which is used on the management console to abstract the underlying system hardware).

C. Execution Flow for Software Upgrades

FIGS. 11A and 11B illustrate two embodiments of the present invention showing the basic flow of events when a software upgrade must be performed on a “live” system. In the first embodiment shown in FIG. 11A, the software upgrade procedure starts from the CLI (command line interface) where a user, typically an administrator, downloads (step 1102) one or more new software components (commonly referred to as “binary images” or simply “images”) to the specified MP and/or LP destinations, depending on which components are being upgraded; i.e., just the MPs, any one of the LPs, or just some of the LPs. Typically all LPs are upgraded, but the present invention does not require that all LPs be upgraded and allows for the possibility that some LP's and not other LP's might be upgraded. For example, LP's with different hardware might be upgraded at different times with different software.

If the image download is successful, the next phase (step 1104), i.e., the hitless upgrade phase, starts by sending an ITC (inter task communication) request message to an scp task in the active MP. From there on until the upgrade is completed or aborted, the CLI is not accessible by the users. The scp task runs a state machine to monitor the upgrade process. To facilitate the description, we assume that we have 2 MPs, MP1 and MP2. When the upgrade process starts, MP1 is the active MP and MP2 is the standby MP.

First, the active (MP1) sends the upgrade information to the standby (MP2) by way of an upgrade request message, step 1106. Upon receiving the ACK, the active MP (MP1) performs a switchover process, boots with the image specified by MP_BOOT_SRC and becomes the new standby MP. Note that this switchover differs from the switchover discussed above in that the newly active MP does not perform a synchronization of its image with that of the previously active MP. The reason, of course, is that the previously active MP contains the old image. Continuing with FIG. 11A, the previous standby (MP2) now becomes the active MP. Upon noticing the new standby MP (MP1) is ready, the newly active MP (MP2) starts a timer to wait for an additional amount of time (e.g., 1 min) to allow the new standby MP (MP1) to stabilize its protocols. Upon the timeout, the new active MP (MP2) sends the upgrade information to the new standby (MP1), step 1108. Upon receiving the ACK, the active (MP2) performs another switchover (step 1110), boots with the image specified in MP_BOOT_SRC and once again becomes the standby MP, and the standby MP (MP1) becomes the active MP. After these two switchovers, the MPs are running the upgraded images. The second switchover (step 1110) includes performing failover processing. In particular, the failover processing according to the invention disclosed in the commonly owned patent application identified above is performed.

FIG. 11B shows an alternative embodiment. As in FIG. 11A, the administrator downloads the image to the active MP in step 1102. Assuming the image is successfully downloaded, the active (MP1) sends the upgrade request to the standby (MP2), step 1104. Upon receiving the request, the standby (MP2) reboots with the new image indicated in the request, step 1106′. The standby (MP2) comes up as standby without synching to the active MP's running image. After the standby (MP2) reboots and comes up again as standby, the active (MP1) waits for mount of time (e.g., 1 minute) for the standby (MP2) to stabilize its protocols, step 1108′. The active (MP1) then reboots itself with the new images (step 1110) and performs MP fail-over processing. It then comes up as the active MP. Both MPs are running the new images. The failover processing in step 110 is performed according to the invention disclosed in the commonly owned, patent application identified above.

A discussion of failover processing is provided herein for completeness and is presented below in connection with FIGS. 4 to 10. The discussion which immediately follows, however, will turn to processing in the LP with regard to software upgrades per the present invention.

FIGS. 11A and 11B discussed above illustrate the basic flow for software upgrades in the MP. However, the LPs also include software components which occasionally require upgrading. In the case where both the MPs and the LPs need to be upgraded, then the processing according to FIG. 11A or FIG. 11B will be performed first. That is, upgrades are first performed in the MP. Then the LPs are upgraded as discussed below. If only the MP needs a software upgrade, then of course the discussion ends here. If only the LP needs a software upgrade, then the processing outlined in FIGS. 11A and 11B are not performed, and only the processing to be discussed next is performed.

Referring to FIG. 12, an overview of the LP upgrade process will not be discussed. The LP upgrade procedure starts with the user, typically via the CLI, downloading an LP image(s) to the active MP, step 1202. In a step 1204, the active MP sends an upgrade message to the LPs to be upgraded; the upgrade message includes the image. Each LP that receives the upgrade message performs a core reset, step 1206. This is a reset which affects only the processor core.

When LP boot starts, it skips the memory controller initialization. When the monitor is loaded, it skips the backplane gigalink initialization. It then boots normally until the LP application is loaded. The MP will skip the flash image synchronization check. When the LP application is loaded, it will populate its software structures while blocking the access to the HW components during LP initialization, module configuration, and port configuration.

The MP will not send card down and port down events when LPs perform a core reset. When the LPs boot up with the upgraded images, the MP will send out card up and port up events, step 1208. Upon receiving card up and port up events, other SW components should discard its previous knowledge about LP state (i.e., LP is up from their point of views), and behave as if the LP is hot-inserted, and populate corresponding LP data structures.

1. Linecard Upgrade Events

The system code provides a number of events associated with the LP (linecard) software upgrade that various tasks on the MP (management module) can register to receive.

a. LP Upgrade Start

This event will be triggered when the LP upgrade command is received from the user. This allows all registered entities to be able to save any important information that needs to be preserved across the LP reset. Only after applications are done with handling this event, will the LP reset be sent to the linecard. Once this event is received, every registered entity should prepare for the situation when the connection to the LP is down. For L2 protocols, this could mean that the timers are frozen (e.g., to prevent reconvergence of the protocol).

b. LP Upgrade Done

Once the application code on the linecard is fully operational, an LP upgrade done event is generated. This will allow all registered entities to start operating normally.

2. MAC Table

Before the LP upgrade begins, the hardware MAC table information is stored into a protected portion of memory. After the LP core reset, the MP synchronizes the software MAC table (exactly in the same way that the synchronization takes place on a normal reset). After this is completed, the MAC entry to hardware index association is recreated. Until this process is completed, all hardware accesses may be disabled.

3. VLAN Table and Configuration

VLAN configuration is synchronized to the LP on card up event. After an LP reset, the VLAN hardware accesses are disabled by the underlying system software. Once the LP application is up, hardware accesses are allowed. It is possible, e.g., due to a double failure, that there is a mismatch between the LP hardware configuration and the LP software configuration. The current implementation will verify the hardware and software configuration and fix any mismatch that is found. Other techniques may also be provided for recovering gracefully from double failures.

4. Protocols Configuration and Operation

The protocol configuration is synchronized to the LP when the card up event is detected on the management module. During the upgrade process, however, protocol packets need to be sent/received in order to maintain protocol stability. This is dependent on the length of the reset process and sensitivity of the protocol. For example, STP (Spanning Tree Protocol), the least sensitive of the supported protocols, will reconverge if no BPDU is received for max-age time (20 seconds). RSTP (Rapid Spanning Tree Protocol), on the other hand, will reconverge after 6 seconds. More sensitive protocols such as VSRP (Virtual Switch Redundancy Protocol, a Layer 2 protocol) and MRP (Metro Ring Protocol, a Layer 2 protocol), proprietary protocols owned by the assignee of the present invention, will converge in about 800 ms and 300 ms.

Protocols will register for all LP Upgrade Events. This section will detail the actions taken by protocols when each of the LP Upgrade Events are received.

a. LP Upgrade Start Event:

When this event is received the following actions are performed by application entities on the MP:

i. Packets that need to be transmitted out from the LP should be sent to the LP. The application then stores this packet in a protected portion of memory that is accessible to both the kernel and the application.

ii. The system API will accept the packet buffer, the destination VLAN, the destination port and the transmit interval. These are stored in the protected portion in memory that is accessible to both application and kernel.

iii. All timers associated with L2 protocol convergence or transitions should be frozen. This will prevent the master down timer for VSRP or the dead timer for MRP from expiring. Most other transitions can be continued.

On the LP, when the reset is done the following actions are performed.

i. When the kernel comes up after reset, it will read the packets from the protected memory and start transmitting these packets based on the programmed interval. The minimum interval is about 50 ms.

ii. Once the LP application is loaded, the application will take over from the kernel and start transmitting these packets. It will continue to do this until the application is fully operational and ready to start transmitting packets that have been generated by the MP.

iii. Once packets are received from the MP, the packet transmission is disabled.

b. LP Upgrade Done Event

On receiving the LP upgrade done event, the protocol timers will be unfrozen and the protocol operation returns to pre-upgrade conditions.

D. The User Interface

Next, is a discussion of portions of the command line interface relevant to the present invention. The software upgrade CLI (command line interface) command has the following syntax:

sw-upgrade tftp <ip_addr> upgrade_script

sw-upgrade slot1/slot2 upgrade_script

sw-upgrade flash upgrade_script

Where upgrade_script is a text file containing upgrade instructions. The format of this text file uses the following keywords:

MP_BOOT_SRC:pri/sec;

LP_BOOT_SRC:all:pri/sec;

LP_BOOT_SRC:<slot#>[[,-]<slot#>]:pri/sec;

The above keywords tell the system which line card(s) needs to be upgraded, and which application images to use. In general, a typical upgrade script instructs the system where to download the new images, and which images to load when performing hitless upgrade. The following is a typical upgrade script:

#

# SW Upgrade using hot redundancy images

#

#SRC:tftp:192.168.138.2;

MP_MON:rmb02201b1.bin;

MP_APP:pri:rmpr02201b1.bin;

LP_MON:all:rlb02201b1.bin;

LP_APP:pri:all:rlp02201b1.bin;

MP_BOOT_SRC:pri;

LP_BOOT_SRC:pri:all;

The above script instructs the system to download an image called rmb0220b1.bin to the MP's monitor, an image called rmpr02201b1.bin to the MP's primary, an image called rlb02200b1.bin to all of the LPs' monitors, and an image called rlp0220b1.bin to all of the LPs' primaries. The script then instructs the system to initiate the upgrade of the MP to run from its primary image, and all LP's from their primary images.

Notes:

a. If the keyword SRC is not specified, the source where the script is downloaded is used to image downloading.

b. If the keyword LP_BOOT_SRC is not specified, the MP will not be upgraded (but the images are still downloaded if MP_MON or MP_APP are specified).

c. If the keyword LP_BOOT_SRC is not specified, the LP will not be upgraded (but the images are still downloaded if LP_MON or LP_APP are specified).

d. If no standby MP exists, or it is not in ready state, the MP upgrade is aborted.

e. If an LP is not in up state, that particular LP will not be upgraded.

In addition to the script-based user interface for hitless upgrade, a simplified CLI command may be provided. The script-based interface is very flexible and performs image download and upgrade in one script. However, the simplified command below assumes the images have been downloaded and both MP && LP images need to be upgraded:

hitless-reload mp [primary|secondary] lp [primary|secondary].

In addition to the standard CLI, it is noted that other suitable interfaces can be provided to implement the foregoing functions. For example, the “user” can be a machine user where the interface is some form of machine interface, such as a protocol-driven interface. This may facilitate an automated upgrading system that automatically performs upgrades in large installations.

E. In-Service Software Upgrade in the LP

1. LP Processing

Upgrade processing in the MP was discussed above. Turn now to FIG. 13 which shows a communication sequence between an LP and the active MP and internal communication within the LP during software upgrade processing in the LP in accordance with the present invention. As can be seen, the timeline of events progresses downward. The events are identified by the circled reference numbers. The figure show a single packet processor in the LP, though it will be appreciated from FIG. 2 that each LP contains a number of packet processors. Though not shown in FIG. 13, the packet processor is connected to a device.

During normal operation, the active MP will send to an LP messages, which are either handled by the LP internally or result in the LP sending data packet(s) to the packet processor for transmission to a device connected to the LP. These events are identified by 1a and 1b. If The LP application will always send a data packet out in order to maintain a “live” connection with the devices connected to the LP. The packets are sent at a rate of one packet every 100 mS. The LP application will either send data packets for processing L1-L7 (OSI model) requests or, absent any data packets, send so-called “keep-alive” packets to maintain the data connection between the LP and the connected device. If the data connection is disrupted due to idleness, then a reconnection sequence with the connected device may have to be performed to re-establish the data connection.

An upgrade event 2a may begin with a user entering a software upgrade command via the CLI user interface, specifying one or more LPs to be upgraded. A task (SCP task) in the active MP sends an upgrade event 2b to the L2 task. The L2 task in the active MP begins the upgrade process in each LP by sending a save protocol packet (event 2c) to the LP. This initiates a series of upgrade activities in the LP. The first such activity is to save certain information that is used by the LP application (event 3).

Turning to FIG. 14 for moment, in order to maintain data integrity of the operational LP during a “live” software upgrade, an appropriate amount of memory is reserved to store certain live data. FIG. 14 shows the CPU memory address space 1402 of the LP's memory according to an embodiment of the present invention. In this particular embodiment, the LP′ memory includes an SDRAM memory which occupies the first 512 MB (Mega-Byte, 220 bytes) of the address space. Of that 512 MB of memory, 64 MB of reserved memory 1416 is reserved for the upgrade process of the present invention. The live LP application data from the various data structures used by the LP application is saved to a memory region 1416c of the reserved memory 1416 before software upgrade takes place. The stored data is then re-loaded into the data structures when the new application is loaded.

Returning to FIG. 13, the L2 task in the active MP also sends a message of type IPC_MSGTYPE_SW_UPGRAGE_INFO at event 2d, which contains the compressed image of the new software to be installed in the LP. The LP's memory also includes a flash memory component. The compressed image is stored in another part of the CPU address space 1402 that is occupied by the flash memory device, referred to as the code flash 1402a. The are two major software components in the LP: the monitor and the application. The monitor corresponds to an operating system and provides low level access to the LP hardware (e.g., UARTs). The application provides the L1 to L7 functionality (per the OSI model) of the LP. The image that is downloaded to the LP form the active MP may either be a new monitor, a new application, or both a new monitor and a new application. Each image is stored in some area of the code flash 1402a. This process typically requires about 50 mS to complete. While event 3 is taking place, the LP application is not performing any L1 to L7 functionality. However, the application will send one (or more) “keep-alive” packets at the rate of one packet every 100 mS (event 4a) in order to refresh and devices connected to the LP in order to maintain the data connection.

At event 5, the monitor image that is stored in the code flash 1402a is decompressed and loaded into a memory space 1416a of the reserved memory 1416. This process typically requires less than 100 mS to complete. Again, during this event, the LP application is not providing L1-L7 functionality; however, one (or more) “keep-alive” packets continue to be sent by the application at a rate of one per 100 mS (event 4b).

Next, the LP application will issue a software reset request (event 6) to the currently executing monitor 1412, which resides in the top portion of the CPU address space 1402. First, an explanation of a “hard” reset (or a power-on reset) will be given.

A power-on reset or hard reset causes the LP's processing unit (CPU, microcontroller, etc., but generally referred to as the CPU) to start instruction/code execution at a fixed location (e.g., at 0xFFFF_FFFC) of the CPU address space 1402. This location is commonly referred to as the reset vector. The vector may be programmed to branch to the starting execution point of a boot image. For example, in one embodiment, a boot flash 1402b occupies a portion of the CPU address space and contains the boot image.

The reset vector points to the beginning of the boot image, and instruction execution begins from that point on. When boot image starts, various hardware components (not shown) are initialized, and certain data structures are set up. For example, the boot code will perform the following:

Configure the basic MMU memory management unit to allow access to SDRAM, processor SRAM, and both code and boot flash.

Initialize the flash file system, decompress the compressed monitor image stored in the code flash 1402 and load it into the top portion of the CPU address space 1402 as the monitor 1412.

After the monitor image is successfully loaded into SDRAM (specifically, the top portion of the CPU address space as shown in FIG. 14), code execution will be handled over from the boot image to the monitor code. The monitor does the following:

Complete the MMU initialization and set up addresses mapping for different peripheral interfaces.

Initialize peripheral interfaces, such as i2c, PCI, back plane Ethernet, local bus, and interrupt controller.

Initialize system memory pool to support memory allocation API (e.g., using malloc( )).

Decompress the application image stored in the code Flash 1402a and store in to the that portion of the CPU address space (which is occupied by the SDRAM) as the LP application 1414.

Launch “main” task to start execution of the application.

The application provides the functionality to support L1-L7 applications and features. The decompressed application image is executed as a separate task launched by the monitor. The total amount of time the CPU spends on initial boot up and loading the monitor code into the SDRAM is around 4 seconds. The Monitor takes another 8 seconds to complete kernel initialization, and to decompress and load the application image into SDRAM. This completes the hard reset or power-on reset processing.

A software reset is a process that allows the CPU to go through the same initialization sequence as discussed above, but without resetting the CPU and all the hardware devices in the system. In accordance with the present invention, there is a “standard” software reset and “hitless” software reset.

Force the CPU to restart code execution at the reset vector location (0xffff_fffc).

When CPU starts code execution at reset vector, CPU will go through the same reset process and reload all images such as boot, monitor, and application but bypass SDRAM reset and CPU memory controller initialization.

Returning to the discussion of FIG. 13, the software reset that is performed at event 6 is a “hitless” software reset. In accordance with the present invention, the hitless software reset function performs the same tasks as the “standard” software reset, with the following difference. In the “hitless” software reset scenario, the monitor does not perform the decompression processing as it would for a hard reset or for a “standard” software reset. Instead, the boot code will load the pre-decompressed monitor—i.e., perform memory copy operation(s)—stored in the area 1416a of the reserved memory 1416 into the portion of the CPU address space 1402 that stores the monitor code 1412. This avoids the step of decompressing, and thus improves upgrade performance.

Since the LP application is not running when a “hitless” software reset is in progress, the monitor performs a protocol sending service in accordance with the present invention. This is a software feature that allows the CPU to continue sending protocol control packets while the CPU is in the process of loading new images. According to the present invention, when the LP receives a upgrade message at event 2d, all pending protocols will be saved. In particular, the protocol state of a communication between the LP and each device with which the LP is performing a protocol interaction is saved in the reserve memory 1416. Thus, for any service that was required at the time a “hitless” software reset is initiated, the LP application will compose protocol packets and store them in the reserved memory 1416 before soft reset. After the monitor code starts, the protocol sending service software stored in a region 1416b of the reserved memory 1416 executes and reconstructs the PCI memory mapping between CPU and packet processors and launches a temporary task. In one embodiment, this task will be activated every 100 mS (or some other appropriate time period), sending out protocol packets stored in the preserved memory. The service will be stopped when the application completes the hitless software upgrade initialization.

At this time, the monitor will also send a REBOOT message to the active MP (event 8), indicating to the active MP that the LP is booting up. The active MP will ACK the message (event 9), and in response the LP begins decompressing the compressed application image (event 10) that is stored in the code flash 1402a.

The decompressed application is stored in an area of the CPU address space 1402 to executed as the LP application 1414 (event 11). At this point, the monitor ceases the protocol sending service (event 12a) and allows the application to start sending protocols for L1-L7 functions or “keep-alive” packets (event 12b). The application will send an upgrade complete message to the active MP (event 13), after which time the active MP will resume its operation with the now booted LP (event 14a, 14b).

F. MP-to-MP Synchronization Framework (MPSF)

The following discussion is presented to provide a more complete description of active MP and standy MP switchover processing.

The MP-to-MP Synchronization Framework (MPSF) is a generic mechanism for synchronization between MPs. The MPSF provides the following functionalities:

1) Proxy server based baseline synchronization

2) Peer-to-peer based update synchronization

3) Blocking or non-blocking mode

4) Queuing service for non-blocking update synchronization

MPSF synchronizes between software components. A software component can be a functional unit implemented in a program and executing on the management processor as a task. Examples include CLI or VLAN, or a service implemented in a library, such as forwarding identifier. All MPSF functionalities work per a software component id that is associated with each software component.

In order to add a new synchronization service, the following elements are needed:

1) a new software component id to identify the new service;

2) a baseline synchronization routine and an update synchronization routines to execute on the active MP; and

3) corresponding baseline and update processing routines to be executed on the standby MP.
This architecture is schematically illustrated in FIG. 5. A task baseline_S( ) executing on the active MP (AMP) communicates with its counterpart task baseline_P( ) executing on the standby MP (SMP) to establish a baseline configuration for the task identified by the software component id parameter. Similarly, a task update_S( ) executing on the active MP communicates with its counterpart task update_P( ) executing on the standby MP to effect updates of the baseline configuration for the task identified by the software component id parameter.

As discussed above during startup, the running configuration read in by the active MP is synchronized by the active MP to the standby MP to establish the baseline state for the standby MP; i.e., baseline synchronization. The baseline synchronization of the running configuration is managed by the redundancy_task executing on the standby MP to establish the baseline state for all of the other tasks that will run on the standby MP subsequent to the startup sequence. The redundancy_task is therefore acting as a proxy for the other tasks, hence “proxy server based” baseline synchronization. It is noted that a baseline synchronization operation can be initiated if a standby MP is inserted into an already running system, in order to establish a baseline state for the newly inserted standby MP.

During normal operation, updates to the active MP will occur; e.g., updates by a user, updates from the linecard modules, etc. The standby NP will be synchronized with such updates by the active MP. Specifically, a task in the active MP which performs an update will initiate a sync operation (via it corresponding update routine) to sync the corresponding task in the standby MP by transferring the update information to the update routine in the standby MP corresponding to that task. This is referred to herein as peer-to-peer based synchronization since the task that performs the update in the active MP is the task that initiates the sync operation to the corresponding task on the standby MP.

The baseline or update synchronization operations can be performed in blocking mode or non-blocking mode. Critical states and data that are to be synchronized before further processing can be done should use blocking mode, otherwise, non-blocking mode is more appropriate. In addition, non-blocking mode sync operations need to be queued up in the active MP in order not to lose any configuration or state information that needs to be synced from the active MP to the standby MP. The term “synced” as used herein means copying information from a source (e.g., active MP) to a destination (e.g., standby MP).

1. Baseline Synchronization

The baseline in MPSF refers to states and databases upon which updates can be applied. Each software component has its corresponding set of state information. A software component must synchronize its baseline before any update can be synchronized. The baseline synchronization operation in MPSF is performed via scp_task. An important parameter in MPSF related to baseline is called “baseline_sync_done”, which is initialized to zero (“0”). This parameter is set to one (“1”) when the baseline synchronization successfully completes. When a software component gets out of sync between NPs, the baseline_sync_done parameter is reset to zero. This condition can arise, for example, when an update sync operation fails.

Depending on the value of “baseline_sync_done” and other conditions, an update synchronization request may be ignored, blocked or queued in accordance with the following:

1) baseline_sync_done=0

If the baseline synchronization hasn't started, then all update synchronization requests are ignored.

If the baseline synchronization has started, then an update synchronization request is:

a) blocked (i.e., the caller task blocks on a semaphore) if the update synchronization is requested in blocking mode, or

b) queued if the update synchronization is requested in non-blocking mode.

2) baseline_sync_done=1

A non-blocking update synchronization request is:

a) sent to standby MP if the queue is empty, or

b) queued.

A blocking update synchronization request is:

a) blocked (i.e., the caller task blocks on a semaphore) if previous update synchronization is in progress, or

b) sent to standby MP if it is the only update request at this time.

If a task does not want to be blocked, it can call an API to check if the baseline synchronization for a particular software component is busy. If yes, it can alter its processing accordingly. One example is the CLI. If the baseline (i.e., CLI session mode & running configuration) synchronization has not completed, the CLI will prohibit users from entering configuration commands.

2. Update Synchronization

As discussed in the previous section, after the baseline synchronization is done, update synchronization is performed based on its mode.

a) Non-Blocking Update Synchronization

If the queue is empty, the update is sent to the standby MP immediately. Otherwise, the update is put into a queue in the active MP. When the standby MP sends a message to the active MP to inform the active MP that the previous update synchronization has completed, the request sitting at the top of the queue in the active MP is sent to standby MP.

b) Blocking Update Synchronization

In the case where update requests come from different tasks (e.g., forwarding identifier synchronization), it is possible that when one update request is issued, there is already another one in progress. In this case, the second caller is blocked. When the one in progress is done, it releases the second one.

c) Error Handling

When a software component is out of sync between MPs, its consequences depend on the software component and the actual data. If the result of this out of sync is only to cause a potential traffic hit during switchover, this is considered an example of a non-critical failure. Otherwise, it is considered a critical failure.

In anon-critical failure, it is sufficient for the software component to re-establish the baseline synchronization. Timeouts can be scheduled to perform the baseline synchronization again.

In a critical failure, more drastic actions may be taken. There are two options:

1) Reset the standby MP, or put it in reset mode. This prevents an out of sync standby MP from taking over if the active MP crashes or is removed.

2) Use a “dirty-bit” to alert the standby MP as to which software component is out of sync. A hardware version of the “dirty-bit” implementation can be very efficient. With this mechanism, the order in which the actions are performed by the task is important. FIG. 6 illustrates the process:

a) step 602—Suppose an action for a given task requires update synchronization. Before processing the action, perform the update synchronization first. For example, if a CLI task receives a command from the user, an update sync is performed to send that command from the active MP to the standby MP.

b) step 604—However, before sending the update, the “dirty-bit” corresponding to that task is sent to the standby MP to indicate that this software component (the CLI in our example) is out of sync (or, more precisely, about to go out of sync) with respect to the active MP.

c) step 606—After the “dirty-bit” is synchronized, send the update. In our example, the CLI command is sent to the standby MP after syncing the dirty bit corresponding to the CLI task.

d) step 608—After the update is synchronized, perform the action in the active MP. Suppose in our example, the command is to “down” a port, the result would be a state change indicating that the specified port is down.

e) step 610—After the action is taken, clear the “dirty-bit” and synchronize the state change to the standby MP. In our example, the down state is synced (sent) to the standby MP.

In this way, the standby MP can determine that a software component is out of sync by checking the its dirty-bit is when a switchover event happens. The actions that the newly active MP takes in this case can range from re-start the software component to resetting the linecard modules.

Following is an illustrative listing of the MPSF API (application program interface). As noted above, each software (SW) component has a unique ID.

This section discusses the synchronization of CLI (command line interface) configurations. Configuration commands typically involve configuration of one or more of the linecard modules 108. A CLI session is provided by executing the console_task on the active MP which be initiated from the console, or from a telnet session. Each CLI session operates independently of any other CLI sessions.

Referring to FIG. 7, when the standby MP comes up, a particular CLI session on the active MP may be active and, if so, will be in a particular configuration mode, such as in an interface mode (e.g., “interface Ethernet 3/1”), or in VLAN mode (e.g., “vlan 101”). It is important that the CLI sessions on the standby MP set up the correct mode before individual CLI commands can be synchronized to and correctly executed on the standby MP. This is the first piece of configuration related data that is synchronized to standby MP, step 702. Before an individual CLI command can be synchronized to the standby MP, the CLI's running configuration on the active MP must be synchronized to the standby MP. This is the second piece of configuration related data that is synchronized to standby MP, step 704. Together, these two pieces of data constitute the baseline for CLI configurations.

Once the baseline is established, each CLI configuration command is synchronized to the corresponding console_task running on the standby MP and executed. A filter is implemented such that non-configuration CLI commands (e.g., “show version”) are not synchronized. Such non-configuration commands do not change the configuration state and so need not be synced.

A particular CLI configuration command is executed on the active MP in the following order:

1) step 706—A CLI configuration command is received.

2) step 708—The CLI configuration command is synchronized to the standby NP in blocking mode. That is, the console_task on the active MP blocks.

3a) step 710—When a synchronization acknowledgement is returned from the standby MP (step 710a), the console_task on the active MP unblocks and the CLI command is executed (710b).

3b) step 712—Asynchronously, the synchronized CLI command is executed on the standby MP.

The configuration information in the standby MP is likely to be more recent than the configuration of the linecard modules because the configuration command is synced to the standby MP (which then begins executing the command) before the configuration command is executed by the active MP itself, and the standby MP acknowledges the command and begins to process the command. Therefore, after an MP switchover completes, the newly active MP should re-send its configurations to the linecard modules in case the failed MP did not have a chance to configure the linecard modules before failing. Each linecard module can then update its cached configurations with the resent configurations. The linecard modules should execute those configurations that are missing from its cached configurations and ignore those configurations that were already executed. This re-send of configurations should be taken care of on a software component basis. This ensures that the actual configurations of the linecard modules match the configurations in the newly active MP.

H. Forwarding Identifier Synchronization

Software components use forwarding identifiers to manipulate traffic; e.g., forwarding data packets to their destinations. Forwarding identifier synchronization consists of two parts: 1) maintain correct forwarding identifiers for the software components in the active and standby MP's; and 2) synchronize forwarding identifier changes between the MP's.

If a software component that uses forwarding identifiers runs on both MPs, the software component in one MP must be guaranteed to be given the same forwarding identifier as its counterpart in the other MP when a request is made to allocate a forwarding identifier. To accomplish this, we need to make the process of allocating forwarding identifiers to be context-aware. In other words, a forwarding identifier that is allocated to a particular software component in the active MP is assigned an “application context”. This “application context” is synchronized to the standby MP as part of the forwarding identifier structure for forwarding identifiers allocated in the standby MP. When the corresponding software component on the standby MP asks for a free forwarding identifier, the forwarding identifier mechanism can locate the correct forwarding identifier based on the “application context” supplied by the caller.

One issue with this mechanism is that the action of asking for a free forwarding identifier on the standby MP, and the synchronization of an allocated forwarding identifier (and the “application context” associated with it) are asynchronously performed. Consequently, the allocated forwarding identifier and its “application context” may not be synchronized when the software component on the standby MP requests a forwarding identifier, with the result that the allocated forwarding identifier will not match the forwarding identifier that was allocated in the active MP. A solution is to return “Invalid FWDING_ID” in this case. When the allocated forwarding identifier is finally synchronized, use its associated context to locate the software component data, and replace the “Invalid FWDING_ID” with the synchronized one.

Following is an example of this mechanism with reference to FIG. 8:

On the active MP:

1) At step 802, a user initiates an application which requires a forwarding identifier, e.g., the user enters “vlan 101”.

2) Before executing “vlan 101”, it is synchronized to standby MP in blocking mode, step 804. That is, the standby MP is made aware of the initiation of the application.

3) After the synchronization returns, “vlan 101” is executed, and it will ask for a free forwarding identifier from the forwarding identifier mechanism, step 806.

3) When the forwarding identifier with “vlan 101” application context is received, the forwarding identifier mechanism uses the application context “vlan 101” to locate the matching SW component data (in this case, the vlan entry). If the matching SW component data isn't found, do nothing; otherwise, replace the entry's forwarding identifier with the one just received, step 828.

I. Trunk Synchronization

A “trunk” command issued via a CLI is synchronized to the standby MP and executed there. The trunks on the standby MP will have all ports disabled since all of the linecard modules 108 are in a “Not Present” state. When switchover happens, trunk configurations are re-sent to the linecard modules by the newly active MP. Each linecard module will execute those configurations that are missing from its cached configurations, and ignore those configurations that were already executed. In this way, the actual trunk configurations of the linecard modules will match the trunk configurations in the newly active MP.

For a trunk to work properly across an MP switchover, the following issues need to be resolved:

1) trunk id—A trunk created with the same ports must have the same trunk id. To achieve this, a trunk id allocated in the active MP is synchronized to the standby MP. This mechanism is similar to forwarding identifier.

2) forwarding identifier groups allocated for a server trunk—Sixteen forwarding identifier groups are allocated for a server trunk at initialization time. These forwarding identifier groups are fixed. The only change needed is to associate each trunk id with a fixed forwarding identifier group.

3) cached configurations in the linecard modules—As discussed above, each linecard module needs to cache its trunk configurations so that on MP switchover, it can compare its cached configuration against those re-sent by the newly active MP.

J. L2 Design

The foregoing sections introduced various components in the underlying architecture for redundant operation in network switching equipment in accordance with the present invention. An overview of various Layer 1 (L1) entities and their processes were also discussed. The following sections provide an overview of various Layer 2 (L2) entities that execute on the active MP and their functions. It also summarizes the requirements from each of these entities. Each L2 entity will be discussed in more detail under each subsection. Within each subsection, 2 specific aspects are addressed—

1. Interaction between L2 entities across the management modules.

2. Failover event handling on the standby MP including changes to the L2 entity agent in the line card.

Typical layer 2 entities include a MAC (media access control) manager, a VLAN (virtual local area network) manager and L2 protocol managers for non-proprietary protocols such as STP (spanning tree protocol), RSTP (rapid spanning tree protocol), MSTP (multiple spanning tree protocols), and protocols such as MRP (metro ring protocol) and VSRP (virtual switch redundancy protocol) which are proprietary protocols owned by the assignee of the present invention. On MP switchover, the newly active MP is faced with line cards (also referred to herein as line cards) that are already initialized and contain configuration and state that may or may not be in sync with the newly active MP. Therefore, the configuration and states in the line cards need to be verified, updated, and/or synchronized to match the newly active MP.

A function of the MAC manager is to perform MAC station learning and propagation/synchronization functions. When a MAC station is unknown, the database manager learns the MAC by adding to its database and synchronizes the newly learned MAC station to all line cards that may be interested. When the line cards detect an MP switchover event, they send the list of MAC addresses that were learnt locally to the newly active MP. The management module then updates its own MAC station table. It is noted that in any event the MAC station learning process in the MAC manager is self-healing in that it will auto-correct even if synchronization by the line cards does not take place correctly.

The VLAN manager allows a user to configure port memberships and properties associated with corresponding VLAN identifiers. It interacts with the protocols to propagate port state changes and MAC station flush requests. It also interacts with a VLAN agent executing on a line card to program its hardware. Further, the VLAN manager handles grouping mechanisms such as topology groups and vlan-groups. On MP switchover, the VLAN manager executing on the standby MP is expected to know the configuration (port memberships and properties) associated with individual VLANs. This configuration may need to be verified against the information that is currently stored in the line cards.

Each protocol manager (e.g., STP manager, MRP manager, VSRP manager, etc) operates specific protocol instances over certain sets of ports. The protocol manager can operate multiple instances of the protocol with different port memberships. The relationship of each protocol manager with the line cards is limited—only associated with programming the CAM (content addressable memory) to allow protocol packets to be processed on a blocked port. The MRP manager also sets up CAM entries to allow protocol packets to be forwarded by hardware. On MP switchover, the protocol manager needs to check if the line cards have their CAM's programmed as expected. There are other issues such as handling acknowledgments from the line card associated with setting the port state and packet sequence number matching (discussed in more detail below).

The synchronization paradigm for L2 entities generally follows the MPSF framework. The L1 processes discussed above synchronize the configuration to the standby MP. The L2 processes focus on enforcing the configuration on the line cards. The L2 processes running on the standby MP do not perform an explicit step of learning the linecard configurations. Instead, events from the line cards that are normally sent to the active MP are also sent to the standby MP. Thus, the L2 processes on the standby MP receive all line card events as they are received by the active MP. Thus, there is no need for the active MP to sync events it receives to the standby MP. This approach has the following benefits:

1. Reduces the number of IPC messages between the active MP and standby MP that would be needed in order to synchronize states in the active MP.

2. Reduces the complexity associated with keeping the management modules, in sync. Since all the events received by both management modules are the same.

On MP switchover, the L2 processes on the newly active MP will enforces its configuration and state onto the line cards, thus guaranteeing that the configuration information in the newly active MP matches the configuration of the line cards.

1. MAC Synchronization

The MAC manager keeps track of the MAC station table. In a specific embodiment of the present invention, this table can store up to 2 million MAC entries at capacity. The standby MP does not keep track of the MAC station table. There is no synchronization of MAC table entries between the active MP and the standby MP in the start-up sequence discussed above. Therefore, when the standby MP becomes active, its MAC table is empty.

Refer to FIG. 9 for the following discussion of MAC synchronization according the present invention. On MP switchover, the line cards each detect that the active MP has changed (step 902). In response to this detection, the line card sets up the newly active MP to initialize the MAC station table in the newly active MP. The MAC station table is initialized during the L1 synchronization process (step 904); i.e., it establishes the baseline MAC station table. During a process known as the MAC software aging cycle, the MAC agent on the line card verifies all the MAC entries that have been learned on its local ports; i.e., the ports that belong to the linecard in question (step 906). Each MAC entry contains information about the management module that the MAC entry was learned from. If there is a mismatch in the management module id on the MAC entry and the currently active MP, the MAC linecard agent will assume that the active MP may not know about this MAC entry. Consequently, the MAC agent will send an IPC message to the newly active MP to indicate and correct this discrepancy (step 908).

The newly active MP processes this request as a learn request by updating its MAC station table with information provided by the line card (step 910). Once it updates its own MAC station table, the MAC manager executing on the newly active MP synchronizes the MAC information to the other line cards (step 912). When this MAC entry is synchronized, each receiving line card updates its MAC entry to reflect the change in the management module id (step 914). This approach has many benefits:

1. Simplicity: The scheme is innately simple and easy to understand/debug.

2. Synchronization: By avoiding the need to communicate between management modules to synchronize the MAC station table, significant compute and IPC network resources are conserved.

2. VLAN Synchronization

The VLAN manager mostly handles configuration requests and protocol requests. It also triggers events and updates when required. The L1 process will synchronize the baseline configuration. This means that the VLAN entity on the standby MP will be configured exactly as in the active MP. This may not always be true, however, since it is possible that the configuration may have gone out of sync, for example, if the active MP crashed during a specific configuration command.

The basic framework as discussed above is that the VLAN processes on the standby and active MP's operate more or less independently of each other. The L1 process gives the same inputs to both the active and the standby; e.g., port up events, trunk create events etc. This would allow the state on the VLAN operational parameters to be in sync.

a) Interaction Between Standby and Active MP

The standby MP will not perform any operation that requires communication with a line card. Effectively, it will not send out IPC messages to the line card and will not process IPC messages from the line card. The VLAN manager in the active MP expects an ACK from the line card; e.g., when setting the state associated with an RSTP port. This ACK will not be required by the corresponding VLAN manager executing on the standby MP. There is little synchronization between the VLAN manager processes across management modules.

Forwarding identifier synchronization between standby and active MPs is an important issue. Since the VLAN manager processes are independent across MPs, it is possible that the forwarding identifier manager in one MP (e.g., the active MP) will not return the same forwarding identifiers to matching VLANs in the other MP. This is especially true if the standby MP is booted up after numerous configuration steps have been taken place on the active MP (i.e., the VLANs have been configured out-of-order). Since the active MP has already synchronized all the forwarding identifier database information to the line card, it would be difficult to modify the forwarding identifiers associated with VLANs. Thus, in accordance with an embodiment of the present invention, forwarding identifier synchronization is handled in L1 processing as discussed above. VLAN synchronization proceeds as follows:

1. The VLAN manager on the active MP requests the forwarding identifier manager for a forwarding identifier. As discussed above, the VLAN request is synced to the standby MP. Also as discussed above, on the standby MP, the forwarding identifier manager will return INVALID_FWDING_ID. The VLAN manager on the standby MP nevertheless continues servicing the request without valid forwarding identifier information.

2. The forwarding identifier manager on the active MP will eventually synchronize a valid forwarding identifier to the standby MP. At this time, the forwarding identifier manager on the standby MP will update the VLAN data structures with the appropriate forwarding ID.
This same process is used for an uplink-vlan configuration, since uplink-vlan configurations also lead to a forwarding identifier request.

B) Failover Event Handling

When a failover event is detected, an MP switchover occurs and the standby MP becomes the active MP. The newly active MP will send and receive IPC messages. As discussed above, the newly active MP synchronizes its current state with all the line cards that are currently operational. This includes the VLAN configuration, the topology group configuration, the SuperSPAN™ and VE configurations, and so on. This could lead to overwriting on some structures in the line cards. However, if the management modules were synchronized at the time of MP switchover, this would not lead to any changes. Changes are required from the line card to be able to interpret the changes.

In accordance with another embodiment of the present invention, the management modules may be kept in sync every step of the way. The management modules would then be more likely to be synchronized when FAILOVER happens. Another approach is to have a dead time on the line card when conflict-resolution was attempted (This was in the case when there were differences between the standby module and the line cards). Using this approach, the information in the line cards is considered, the traffic is not stopped even if there are differences in the line card and management module states.

c) Line Card VLAN Agent

There are minimal changes in the line card to process an MP switchover event. These are mostly verification routines to make sure that the VLANs are all configured correctly. If the VLAN information in the newly active MP does not match the line card configuration or is out-of-date, it will still be used because that is the state that the newly active MP sees at this point. The current approach of overwriting the line card state and configuration with what is known to the newly active MP simplifies implementation on the VLAN agent.

In operation, the VLAN agent compares the configuration on the newly active MP with the configuration in the line card. If the configuration is different, the VLAN agent modifies its configuration in the line card to match the configuration on the newly active MP. This could be enhanced as described earlier with conflict-resolution type implementation where the differences are flagged and revisited after a certain interval (called dead-time).

In one embodiment, the line card VLAN agent always starts with a clean slate when the active MP fails and the standby MP takes over. Processing in the line card includes:

1. Verification to check the validity of the various configurations on the line card.

2. Differentiating between the configuration that is currently in place in the line card and the configuration that is being received from the newly active MP.

3. Overwriting the configuration received from the previously active MP.

3. Protocol Synchronization

Protocol synchronization is different from VLAN synchronization in that there is little interaction between the protocol manager on the active MP and the protocol agent in the line card. In fact, most of the protocol interaction with the line card is through the VLAN manager. However, the interaction between the protocol entities across management modules may need to be significantly higher.

An example of an issue would be the case when the standby MP is booted up after the STP (spanning tree protocol) manager on the active MP has converged. The corresponding STP manager on the standby MP requires some amount of time to converge, and so the spanning trees between the active NP and standby MP will not match for a period of time.

In one embodiment of the present invention, protocol synchronization is performed in a manner similar to that of the VLAN synchronization. In this embodiment, each protocol may run independently on the standby MP. It would receive all events and protocol packets from the line card in order to keep it up to date with the active MP. This is referred to herein as “protocol redundancy.”

a) Protocol Redundancy for STP (Spanning Tree Protocol)

In the case of STP, the port states are computed in a predictable manner for a given configuration of bridge protocol data units (BDPUs). This fact allows the standby MP to arrive at the same STP result (port state configuration) as in the active MP when the former becomes the newly active MP. Referring to FIG. 10, the STP manager on the standby MP will receive all events as received by the active MP. Specifically, when a line card sends an STP event (step 1002), both the active MP and the standby MP will detect the occurrence of the event. The active MP will compute the new port states (step 1004) and send BDPUs to communicate the change in ports (step 1006). The STP manager on the standby MP, on the other hand, will not send out BPDUs, but will receive and process BPDUs in order to calculate port states from the information contained in the received BPDUs (step 1008). These port states are synchronized to the VLAN manager on the standby MP. When a FAILOVER event is received, the standby STP manager enables transmission of BPDUs. The VLAN manager will take care of synchronizing the states of individual ports.

STP TCNs (Topology Change Notifications) require acknowledgement from the root bridge. It is possible that when the standby MP comes up after the active MP, its TCN will go unacknowledged. This is because the root bridge is not really seeing the BPDUs of the standby MP. To account for this case, the STP TCNs in the standby MP will be assumed to receive acknowledgements.

b) Protocol Redundancy for RSTP

In the case of the RSTP protocol, the RSTP manager on the standby MP will receive all events as received by the active MP. The RSTP manager in the standby MP will not send out BPDUs but will receive and process BPDUs.

The relationship with the VLAN manager is tricky due to the ACK mechanism (blocking port state set call). In order to proceed with the RSTP Port state transitions (PST) state machine, an ACK from the line card is needed. This ACK is needed to confirm that the port has been set to the appropriate state.

Therefore, according to an embodiment of the present invention, all port state set calls will be non-blocking on the standby MP (both the VLAN manager and RSTP manager use non-blocking calls). The standby will also ignore ACKs sent by the line card. The linecard will also send the ACK to the management module that made the request, rather than to the active MP. This will avoid corner cases during failover such as when the ACK is incorrectly sent to the active MP causing a transition before the hardware is setup.

RSTP requires handshaking (in some cases) between peers on point-to-point links. An example is the proposal-agree mechanism between a designated port and the root port. If this handshake is already completed by the time the standby MP comes up, the RSTP manager running on the standby MP will never know that the handshake had been performed.

Solution: There are 2 cases here—(1) when the standby has a designated port that should have received a BPDU with the “agree flag” set, or (2) the standby has a root port that should have received a BPDU with the “propose flag” set.

c) Protocol Redundancy for MRP

Although MRP is a proprietary protocol, a brief discussion of redundancy processing for MRP will be made for completeness. The MRP manager on the standby MP will receive all events and packets called “ring PDUs” (similar to BPDUs) as received by the MRP manager in the active module. There are issues in the MRP that are not related to spanning-tree protocols”

1. Sequence number sent per ring PDU: The metro-ring protocol sends out ring PDUs every 100 ms. Each PDU has a sequence number associated with it that is used in diagnostics. This sequence number needs to be matched on both the standby MP and the active MP. The metro-ring protocol on the standby MP skips the sequence number check the first time and updates the sequence number based on the received sequence number.

2. MRP sessions for each MRP flow: These are required to hardware forward ring PDUs. Each MRP session needs to be learned and deleted when done.

3. Short convergence times: MRP has very short convergence times of about 300 ms. This may not be achieved when FAILOVER is in progress. When a FAILOVER event is received on a master MRP node, the dead-time is increased significantly (e.g., from 300 ms to about 8 seconds). This allows the management module to run verification checks on the linecard data sanity without MRP re-converging.

d) Protocol Redundancy for VSRP

Although VSRP is a proprietary protocol, a brief discussion of redundancy processing for VSRP will be made for completeness. VSRP shares similar issues as MRP. Due to the increased sensitivity of the protocol, it is imperative that VSRP/MRP packets get sent out from the standby MP as soon as it becomes active. This is achieved by running the VSRP manager on both the active MP and the standby MP. VSRP will re-converge within 800 ms if VSRP packets cannot be sent or received.

When a FAILOVER event is detected, VSRP freezes the timers to allow the line card to initialize. The following timers are frozen—dead timer, the hold down timer, backup expiry timer. The dead timer and the hold-down timer keep track of the time since the last VSRP message was received from the VSRP master.

Transmission of VSRP packets will continue as scheduled since the VSRP manager on the standby MP is operating under the assumption that it is the active MP. Once L1 detects the switchover, it will allow transmission of L2 protocols that were previously black holed.

e) Protocol Redundancy—VSRP Aware

Although VSRP aware is a proprietary protocol, a brief discussion of redundancy processing for VSRP aware will be made for completeness. VSRP aware sessions are synchronized by the linecard on startup. This process is similar to the MAC table synchronization. The standby MP does not store VSRP aware sessions. On failover, the newly active MP does not have any VSRP aware sessions stored.

When a line card detects the arrival of the newly active MP, it traverses through all of its local VSRP aware sessions and sends the information to the newly active MP. The newly active then synchronizes these sessions to the other line cards. An aware session that does not get anymore VSRP packets will simply age out.

Although specific embodiments of the invention have been described, various modifications, alterations, alternative constructions, and equivalents are also encompassed within the scope of the invention. The described invention is not restricted to operation within certain specific data processing environments, but is free to operate within a plurality of data processing environments. Additionally, although the present invention has been described using a particular series of transactions and steps, it should be apparent to those skilled in the art that the scope of the present invention is not limited to the described series of transactions and steps.

Further, while the present invention has been described using a particular combination of hardware and software, it should be recognized that other combinations of hardware and software are also within the scope of the present invention. The present invention may be implemented only in hardware, or only in software, or using combinations thereof.

The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that additions, subtractions, deletions, and other modifications and changes may be made thereunto without departing from the broader spirit and scope of the invention.

Claims (6)

1. A method for installing program code in a network device that is in data communication with a plurality of connected devices, said network device having a processor that is executing first program code, the method comprising steps of:

receiving a message containing compressed data;

storing said compressed data to a first memory region of a memory of said network device;

decompressing said compressed data stored in said first memory region to produce second program code;

storing said second program code to a second memory region of said memory;

copying said second program code from said second memory region to a region of said memory within which said first program code is stored, thereby replacing said first program code with said second program code;

setting said processor to execute said second program code; and

while performing the foregoing steps, transmitting data packets to said connected devices at a rate substantially equal to or greater than a predetermined rate, wherein the time to copy said second program code from said second memory region to said first memory region is less than the period of said predetermined rate,

wherein said transmitting said data packets maintains data connections with said connected devices.

2. The method of claim 1 further comprising performing a reset operation on said processor prior to said copying said second program code.

3. The method of claim 1 further comprising:

performing a reset operation on said processor prior to said copying said second program code;

in response to said reset operation, saving protocol packets destined for one or more of the connected devices;

decompressing second compressed data stored in a third memory region of said memory to produce third program code, including storing said third program code to a fourth memory region of said memory; and

while performing said steps of decompressing and storing, sending out said protocol packets to said connected devices.

4. The method of claim 1 wherein the data packets include a keep-alive packet.

5. The method of claim 1 wherein the network device is a linecard processor.

6. A network device comprising:

a data processor component;

a plurality of communication ports for data connection with a plurality of connected devices;

a memory component; and

first program code stored in said memory component that is configured for execution by said data processor component,

said program code configured to operate said data processor component to:

receive a message containing compressed data;

store said compressed data to a first region of said memory component;

decompress said compressed data stored in said first region to produce second program code;

store said second program code to a second region of said memory component;

copy said second program code from said second region to a region of said memory component within which said first program code is stored to thereby replace said first program code with said second program code;

set said data processor component to execute said second program code; and

transmit data packets to said connected devices at a rate substantially equal to or greater than a predetermined rate during said foregoing steps, wherein the time to copy said second program code from said second region of memory to said first region of memory is less than the period of said predetermined rate,

wherein said network is actively in communication with said connected devices,

wherein said data packets maintain said active communication with said connected devices.