Abrégé anglais

An internetworking system operating over an ATM backbone. The physical internetworking devices within the system are shared to provide the internetworking functions while servicing two or more distinct and isolated user networks. This is accomplished by logically partitioning the devices into distinct sub-elements which provide all or part of the internetworking functions. These sub-elements are uniquely allocated to independent realms which are then assigned to specific user networks.

THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:

1. A method of forwarding packets in a communication systemhaving multiple incoming and output service interfacesproviding service to multiple user networks, said methodcomprising:providing said system with multiple forwarding rulessaid rules based on routing information;receiving said packets at one of said incoming serviceinterfaces;selecting an appropriate forwarding rule based on asource address in said packets; andforwarding said packets to one of said output serviceinterfaces based on a destination address in said packets andinformation in said forwarding rules.2. The method as defined in claim 1 wherein said servicesinterfaces support realms each relating to a specificinstance of an internetworking service function.3. The method as defined in claim 2 wherein said specificinstance is a public Internet access service.4. The method as defined in claim 2 wherein said specificinstance is a virtual private network (VPN) service.5. The method as defined in claim 2 wherein said VPNservice is a bridged and/or routed connectivity service.

76

6. The method as defined in claim 2 wherein said VPNservice is a network layer connectivity service.7. The method as defined in claim 1 wherein saidcommunication system includes an ATM transport fabric.8. A packet forwarding entity for a communication systemcomprising:a plurality of user networks;multiple service interfaces providing instances ofservice to said user networks;multiple route servers for calculating multipleforwarded rules relating to instances of service to whichsaid service interfaces belong based on customer information;andedge forwarders to direct said service interfaces touser networks based on information in said forwarding rules.9. The packet forwarding entity as defined in claim 8wherein said instances of service are assigned to specificnetwork users.10. The packet forwarding entity as defined in claim 8wherein service interfaces relate to physical and logicalconnections.11. The packet forwarding entity as defined in claim 8wherein said logical connections include multiple trafficflows from one or more ingress ports.

77

12. The packet forwarding entity as defined in claim 8wherein said one of said instances of services is aninternetworking service function.13. The packet forwarding entity as defined in claim 8wherein said internetworking service function is a Public Internet access service.14. The packet forwarding entity as defined in claim 12wherein said internetworking service function is a virtualprivate network (VPN) service.15. The packet forwarding entity as defined in claim 14wherein said VPN service is a bridged and/or routedconnectivity service.16. The packet forwarding entity as defined in claim 15wherein said internetworking service functions are providedover an ATM network.17. The packet forwarding entity as defined in claim 15wherein said internetworking devices support multipleprotocols.18. The packet forwarding entity as defined in claim 17wherein said internetworking devices provide services at boththe packet and frame levels.19. The packet forwarding entity as defined in claim 18wherein said internetworking services are managed by a singleservice provider.

CA 02217275 1997-10-03 Multiple Internetworking Realms Within an Internetworking Device Field of the Invention This invention relates to the provision ofinternetworking service functions utilizing multi-protocolover ATM (MPOA) and more particularly to a system and methodwherein a common backbone infrastructure is shared byseveral distinct user networks. Background Multi-protocol over ATM (MPOA) represents an importantdevelopment in the communications industry in that itpermits the internetworking of local area networks (LANs)over an ATM backplane. This internetworking leads to thedelivery of multimedia services such as video, voice, imageand data. Currently, MPOA internetworking architectures are notcapable of servicing more than one user network. Internetworking devices within the network architectureprovide one or more functions related to forwarding datapackets through a network. The primary keys used to controlinternetworking forwarding functions are network addresses. Within a particular network these network address'keys mustbe unique for the correct operation of the forwardingfunctions. In many internetworking systems, in particularthose based on the Internet protocol, the correct operationof the forwarding functions requires the additionalconstraint that these network address keys are organized inan ordered hierarchy of partial address prefixes where theunique set of keys used to control the internetworking2

CA 02217275 2004-07-23forwarding function at different points within the networkare different. In current systems, a router and bridgecombination sometimes known as a ridge provides the addresskeys in order to forward the data packets to the properdestination. Summary of the Invention The purpose of the present invention is to permit thesharing of physical devices which provide the internetworkingfunctions while servicing two or more distinct and isolateduser networks. This is accomplished by logicallypartitioning the devices into distinct sub-elements whichprovide all or part of a specific internetworking functionincluding: physical interfaces; connectivity contexts;dynamic storage and context for routing calculations; storageand context for forwarding information; storage for queuingof packets being forwarded; and the necessary storage andcontext of secondary elements of the internetworkingforwarding functions. The sub-elements of the devices arethen uniquely allocated to independent realms. Theseindependent realms are assigned to specific user networkspreserving the necessary uniqueness and any local differencesin the primary address keys and all other secondaryinformation used in the correct operation of theinternetworking forwarding function. The present invention provides a distributed systembuilt from collaborating internetworking devices and providesfor large-scale internetworking services for carriers andservice providers. This is known as carrier scaleinternetworking or CSI.3

CA 02217275 2004-07-23 Therefore, in accordance with a first aspect of thepresent invention there is provided a method of forwardingpackets in a communication system having multiple incomingand output service interfaces providing service to multipleuser networks, the method comprising: providing the systemwith multiple forwarding rules the rules based on routinginformation; receiving the packets at one of the incomingservice interfaces; selecting an appropriate forwarding rulebased on a source address in the packets; and forwarding thepackets to one of the output service interfaces based on adestination address in the packets and information in theforwarding rules. In accordance with a second aspect of the presentinvention there is provided a packet forwarding entity for acommunication system comprising: a plurality of usernetworks; multiple service interfaces providing instances ofservice to the user networks; multiple route servers forcalculating multiple forwarded rules relating to instances ofservice to which the service interfaces belong based oncustomer information; and edge forwarders to direct theservice interfaces to user networks based on information insaid forwarding rules. Brief Description of the Drawings The invention will now be described in greater detailwith reference to the attached drawings wherein: Figure 1 is a service view of a CSI system; Figure 2 is an architectural view of a CSI system; Figure 3 illustrates control and data traffic for Internet service;4

CA 02217275 2004-07-23 Figure 4 illustrates control and data traffic for route VPN; Figure 5 shows one PIPE implementation; Figure 6 is a Realm level Service Differential example; Figure 7 shows intra-realm Vnet level servicedifferential; Figure 8 illustrates a CSI management model; Figure 9 is a diagram of traffic and control flow to andfrom a PIPE; Figure 10 illustrates a simplified CSI system; Figure 11 shows a network layer forwarding mechanism;and Figure 12 is a PIPE instance screen. Detail Description of the Invention CSI has a number of new terms which are described herein the hope that it will help the reader better understandthe balance of this document. Refer to Figures 1 to 4 forfurther information on how these functions are related andinterconnected in a CSI system.1. Internetworking Services: Internet connectivity, routed VPNs, and bridged VPNs are three examples of internetworkingservices that a carrier may provide to customers through the CSI system.4a

CA 02217275 1997-10-032. Access Interfaces: Access Interfaces are the physicalinterfaces that are used to deliver one or moreinternetworking Service Interfaces between the customer andthe CSI system (e.g. T1 Frame Relay interface, STMT UNIinterface, lOBaseT interface, etc.).3. Service Interfaces: Service interfaces are the logicalinterfaces through which internetworking services areprovided to the customers. Frame Relay Vcs, ATM VCCs, PPPlinks, 10/100 Ethernets, 802.1Q explicitly-tagged VLANs, and FDDI LANs are examples of service interfaces to be supportedby CSI.4. Service Interface Groups: A Service Interface Group issimply a collection of Service Interfaces. Service Interface Groups are part of the CSI Management Model.5. Access Termination: Access Terminations are thecomponents that terminate the data and control planes of Access Interfaces at the customer side of the network.6 Edge Forwarder: In the CSI architecture, the Edge Forwarder (EF) refers to the logical components of the CSIsystem that perform the layer 3 edge forwarding functions(e. g. PIPE card, Ridge forwarding engine).7. Default Forwarder: The next-hop forwarder to which apacket is sent when a forwarder has no specific forwardingentry for that packet in its forwarding information base.8. Core Forwarder: In the CSI architecture, the Core Forwarder (CF) refers to a low overhead, low functionalityforwarding device in the core of the CSI network. The CFhas no direct service interfaces and gets its forwardingtable downloaded from the RS (i.e. CF runs no routingprotocols).5

CA 02217275 1997-10-039. Route Server: In the CSI architecture, the Route Server's main task is to generate and download theforwarding tables to the Edge and Core Forwarders. The RSesrun all the required internal and external routing protocolsin the CSI system to provide both default connectivity andshortcuts. The RSes are not part of the user data path.10. Config. Or Configuration Server: In the CSIarchitecture, the Config Server's main tasks are: 1) toreply to requests from Edge Forwarders as to the where-abouts of their Route Servers, 2) download loadconfiguration information to Route Servers regarding VPNs,routing protocols, and other configuration informationrequired by the Route Server to run and 3) track the Route Servers status and activity (i.e. which RSes should beactive and which ones should be on standby).11. Shortcut VCs: These are direct SVC connections betweentwo Edge Forwarders established for forwarding. Shortcutsare established by the EFs as a result of flow detectionpolicies or administrative control.12. Customer: In the CSI System, a customer is the owner ofa Realm. A customer can have one or more realms.13. Realm: The CSI System allows 3 types of realms, Routed VPN, Bridged VPN or Public Internet realms.14. Bridged VLAN: Bridged VLAN is a way of providing Bridged VPN service. A Bridged VLAN belongs to Bridged Realm and supports multiple protocols. A Bridged VLANoperates over a set of Service Interface Groups.15. Virtual Subnet: A Virtual Subnet is a way of providing Routed VPN service. A Virtual Subnet belongs to a Routed6

CA 02217275 1997-10-03 Protocol Realm and supports one protocol (IP) in thisdescription. A Virtual Subnet can be configured to operateon one or more Service Interface Groups: a Virtual subnetcorresponds to one IP subnet.16. Subnet Group: A collection of Subnets. A Subnet Groupis part of the CSI Management model. The purpose of Carrier Scale Integration (CSI) is tomeet the future needs of large providers of internetworking(frame- and packet-based) services. To do so, CSI strivesto meet ambitious goals in:number of customer connection points;number of simultaneous connected individual users;number of simultaneous flows:support for public and multiple private Internet packetservices;support for multiple private bridged services;access resale with distinction of customers to a fine degreeof granularity, e.g. to different end stations within acustomer sitedifferentiated service for both configured and dynamicallydetected flows;reduction of relative management complexity;modularity of functions, such that the CSI system workstogether as a whole, but functions can be replacedindividually with constrained impact;high availability;high stability, including routing. CSI is a distributed system built from collaborating ATM switches, route servers, access terminations, edgeforwarders, default forwarders, core forwarders, amanagement system, and auxiliary servers. As a whole, the CSI system provides internetworking services at both thepacket and frame levels. The CSI architecture defines theexternal interfaces between the CSI system and the outside7

CA 02217275 1997-10-03world and the internal interfaces between CSI components. A CSI system is expected to be managed as a whole, by or onbehalf of a single service provider. External interfaces are classified as either accessinterfaces or service interfaces. Access interfaces are the interfaces over which one ormore service interfaces are provided between the customerand the CSI system (e. g. STMT UNI or lOBaseT). Accessinterfaces interconnect the CSI system and customer accessnetworks, which can be any of various technologies, from a PSTN modem to a campus LAN. The concept of the accessinterface includes all aspects of the interface which arespecific to the particular physical type of the interface aswell as any interface-specific transmission protocol issues. Access interfaces are provided by CSI components knownas Access Terminations. Packets transmitted towards (andreceived from) the access network are encapsulated (and de-capsulated) by the access termination components. Theaccess termination device provides all the control andauxiliary functions required by the access interfaces andtransmission across them, e.g. switched-access signaling and Frame Relay LMI. Access interface does not refer to aphysical interface of the access termination, but rather toa set of functions performed by the access termination. Conceptually the access interface is internal to the accesstermination. Service interfaces are logical interfaces through whichservices are provided to the customers. A service interfaceis expected to carry traffic for one customer, although acustomer may encompass many end systems. The control anduser data flows for each service are those appropriate tothe service. Service interfaces are provided by Edge Forwarders. Edge forwarders exchange encapsulated, interface-independent PDUs (Protocol Data Unit) with the access terminations, andprovide all control and auxiliary functions required by8

CA 02217275 1997-10-03higher layer encapsulations and control protocols such as PPP. A service is coordinated communication between anaccess termination and a specific customer across a serviceinterface, using sets of supported protocols and themanagement of control and user information according tothose protocols. Three services are available in CSI:1) Public Internet access service, which is managedconnectivity to the public Internet.2) Virtual private network (VPN) service, which ismanaged connectivity to a virtual private network. Avirtual private network may include both virtual LANs(bridged connectivity) and virtual subnets (network layerconnectivity). A service enables connectivity to a Realm. A realm isa specific instance of an Internet or VPN service. Within a VPN realm, there may be multiple virtual LANs for differentprotocol families, but only one of each. A single serviceinterface may support multiple virtual subnet services(within a VPN realm), but only if their Internet addressspaces are distinct. Different PDUs from a single endstation may be injected into different virtual ZANs orvirtual subnets. An access interface may support more than one serviceinterface simultaneously, but a service interface maysupport only one service at a time, and a service may beprovided for only one realm at a time. The particularservice and realm available on a particular serviceinterface shall be controlled by configured policy,authentication and authorization. Mechanisms for providing services and distinguishingrealms are discussed later. One aspect of service is differentiated service. Depending on the capabilities of individual access andservice interfaces, and customer configuration, service maybe differentiated in several ways. For example, some9

CA 02217275 1997-10-03traffic may be given simple priority, or weighted fairqueuing schemes may be enforced. The CSI architecture isintended to allow for service differentiation at the levelof individual flows, but does not require it. In some casesservice differentiation might be done at the level of awhole service interface. Finally, one or more route servers may communicate withother routing entities outside of the CSI system, for theexchange of Internet routing information. From the point ofview of routing, the route servers represent the CSI systemto the outside world. This communication takes place at the_ Internet layer, across an access termination or an edgeforwarder. The foundation of a CSI system is an ATM network. Onthis ATM network, CSI coexists with other services whichmight be offered, such as circuit emulation. In practice, asingle ATM network may serve as all of access network,distribution fabric and transport fabric. The role of the ATM network is to provide high-speed, complete connectivitybetween components of a CSI system. The purpose of thenomenclature of the three fabrics is to aid in discussion. All interfaces between the fabric and the components ofa CSI system are ATM UNI (User Network Interface)interfaces. In the CSI system, all packets within a flow of eithercontrol or user data are encapsulated using LLC (Logical Link Control) encapsulation. This permits, but does notrequire, multiple flows to be carried over a single VCC. Control and user data flows cannot be carried in the same VCC. The management system provides all other CSI componentswith the basic configuration information they need tocommunicate and to establish bindings between interfaces,services and realms. Configuration information is given toeach component when it becomes operational, and may also beupdated at any time.

CA 02217275 1997-10-03 The management system itself may be made up of one ormore components. Access Terminations provide access interfaces. On theaccess network side they terminate data and control planes. On the CSI side of the network they provide a uniformconnection mechanism and traffic stream to edge forwarders. Access terminations act as aggregation and distributionpoints, collecting traffic from access networks todistribute to one or more edge forwarders, and distributingtraffic from one or more edge forwarders to one or moreaccess networks. The distribution of traffic is controlledby configuration information. The primary motivation for separating the accesstermination functions from the edge forwarding functions isto enable the access resale capability. Access terminations provide limited servicedifferentiation through traffic prioritization betweeninterfaces. This is done under the control of themanagement system. Access terminations do not do anyfiltering or traffic shaping for incoming (i.e. from theaccess network) traffic. Outbound queues are FIFO queueswith Random Early Drop (RED). Edge forwarders terminate service interfaces andprovide all functions related to forwarding in the CSIsystem, for both packets and frames. Edge forwarders arepotentially the most sophisticated components in a CSIsystem. While access terminations may distinguish betweentraffic destined to different edge forwarders, edgeforwarders are responsible for more sophisticated servicedifferentiation. Edge forwarders receive encapsulated PDUs from accessterminations and other forwarders, examine them according torules given by the management system, categorize them,manipulate them as necessary, and forward them using rulesappropriate for the realm in which the PDUs are placed. The11

CA 02217275 1997-10-03processing rules may lead to forwarding of either bridgedframes or routed packets, in private or public nets, on aper-PDU basis. Where the control plane of a service interface includesauthentication, for example with PPP, the edge forwarderwill perform preliminary authentication of users, since thismay affect the distribution of traffic. Edge forwardersalso provide all other functions ancillary to higher layerprotocols, such as support for proxy ARP (Address Resolution Protocol) and inverse ARP, and may act as a proxy for someservices such as DHCP (Dynamic Host Configuration Protocol). They may make use of other resources, such as route servers,to perform these functions. Edge forwarders represent the CSI system at the Internet level, for example by respondingto IP-based echo requests. Edge forwarders inform route servers of all changes intopology concerning links to access terminations andconfigured links to other forwarders. Edge forwardersdifferentiate between flows and provide differential queuingservices for flows where configured. Edge forwarders mayalso detect flows and create "shortcut" VCCs to otherforwarders where appropriate, when allowed by configuration. While not a basic architectural component, the conceptof an Access Forwarder is used in practice. "Access Forwarder" is shorthand for a close association of thefunctions of Access Termination and Edge Forwarder. Architecturally the functions remain separate. In realityan access forwarder need not use the standard interfacebetween access termination and edge forwarder. A core forwarder is a low overhead, low functionality,possibly high speed Internet-level forwarding device in thecore of the CSI network, for use only by public Internetservices. Core forwarders are not necessary to thefunctioning of a CSI system, and are provided to supportscalability (by making it possible to reduce the number of VCCs between edge forwarders and by offering a default12

CA 02217275 1997-10-03forwarding path for forwarders which cannot hold fullforwarding databases). A core forwarder has no directservice interfaces and runs no routing protocols. Specialfeatures, where necessary, should be implemented in the edgeforwarders and access terminations, thus allowing the coreforwarder to support high speed and high capacity withouthigh overhead. Although some end-to-end features (e.g. in Resource Reservation Protocol {RSVP} and Integrated Services) require support in all forwarders, in the coreforwarder speed and capacity are far more important thanfeature richness. For scaling of VPN realms, it is anticipated that itwill be possible to support core forwarders which arededicated to particular VPN realms. At this time coreforwarders are intended particularly for public Internetrealms. A default forwarder is essentially a more intelligentcore forwarder, used in support of private realms. Inprivate realms, edge forwarders may not have completeforwarding information. Rather than drop packets/frameswhile they are retrieving this information (from routeservers) they forward them to the default forwarder. Thedefault forwarder is more sophisticated than a coreforwarder, in that it must take VPN policy information intoaccount when deciding how to forward. In the cases of both packets and frames, route serversare responsible for routing, while forwarders areresponsible for forwarding. The functions of routing areexplicitly separated from the functions of forwarding, inorder to make it possible for individual components to doeach more efficiently. Route servers are not in any userdata path, and are not responsible for forwarding any userdata. Route servers are responsible for:13

CA 02217275 1997-10-03providing forwarders with service-related configurationinformation and interface bindings, and updating thisinformation as necessary;exchanging routing information with internal and externalrouting agents;gathering information internally to keep track of internaltopology:computing forwarding databases as needed from the aboveinformation and from configured policy;disseminating these databases to the edge and coreforwarders (full tables in the public Internet case;partial, full, or on-demand for private services): andanswering queries in support of other functions theforwarders may perform such as ARP. Auxiliary servers provide support for such services as DHCP, DNS, and NTP, which run at a higher layer but areconsidered fundamental to normal network use. Such servicesare beyond the scope of the CSI architecture, but supportfor their functioning across the CSI system is not. In some cases, the auxiliary server may not be directlyassociated with the CSI system, e.g. an exogenous RADIUSserver may be used to provide AAA services, or even if it ispart of the system, e.g. an internal RADIUS server, it maynot be user-visible. This category does not include "content" servers suchas NetNews, web servers, electronic mail, or user directory Services. Interfaces between CSI components support both controland user information. Interfaces occur over either"persistent" or "non-persistent" ATM SVCs. Persistent SVCs(SVC-Switched Virtual Circuit) are established perconfiguration, are maintained regardless of inactivity, andare re-established in the case of failure. Non-persistent SVCs are established only as needed and are released oninactivity. The particular definition of "inactivity" is a14

CA 02217275 1997-10-03matter for local policy, and may be part of the informationobtained from the management system. A flow of either control or user information is carriedin a single VCC. Multiple flows may be carried in a single VCC, but control flows are separate from user informationflows. All configured control flows within the CSI system takeplace over persistent SVCs. User data flows used to providedefault connectivity--that is, flows established based onconfiguration information and not on observed behavior oftraffic or other criteria--are also carried over persistent SVCs. All other flows are carried over non-persistent SVCs. In all cases, when a VCC is set up, ATM signaling isused to indicate the particular realm the VCC is being setup for. ATM signaling may also be used to indicate that a VCC is to be used for multiple realms, using B-LLI, B-HLI,and/or L2TP. Each component has, as part of its basic configuration,one or more anycast ATM addresses for contacting themanagement system. The first connection a componentestablishes is with the management system over a.persistent SVC. In the usual case, the management system then givesthe component the information it needs to establish otherdefault connections, and to know how to use them. These"default forwarding"connections are then established andmaintained. Specifics of internal interfaces follow. The first connection established by any componentexcept the management system is with the management system. This is a control interface, with no user data flow. Everycomponent must maintain a persistent connection to themanagement system. In the usual case, the management systemthen passes configuration information to the component whichthe component needs in its specific situation. This policyinformation may include: Access interfaces and service interfaces to be enabled.

CA 02217275 1997-10-03 ATM addresses and other necessary information forestablishing connections with other components. Othercomponents may include: edge forwarders, core forwarders(for all but access terminations), access terminations (foredge forwarders), and default forwarders and route servers(for all but access terminations). Access terminations are given rules to use in determininghow incoming traffic should be processed and forwarded. However, such information is not given to forwarders fortheir service interfaces--they obtain that information fromtheir route servers.What to accept connections from. Information for route servers regarding realms, routingpeers and protocols, and components for which they areresponsible. Bindings of route servers to realms and services The management system may update a component'sconfiguration information at any time using the interfaceprovided by the persistent VCC. Components may have information configured statically. Although they must connect to the management system, thereis no requirement that they receive their policy informationfrom the management system. CSI system managers may tradeoff the ease of central configuration management for thesake of simplicity and robustness. Hybrid schemes arepossible where management information is staticallyconfigured into a component, but can be overridden bydynamically downloaded information. Protocols used forcarrying information between the management system and other CSI components must be reliable. An access termination examines incoming traffic andredistributes it to one or more edge forwarders in one ormore VCCs, according to configured policy. An accesstermination interacts only with the management system andwith one or more edge forwarders.' 16

CA 02217275 1997-10-03 An access termination may bypass nearby edge forwardersand use VCCs to remote edge forwarders. This practice isknown as access resale, and allows the CSI system operatorto deliver traffic transparently from an access terminationin one location to an edge forwarder in another location,for example to an interface to an Internet service provider. In large-scale environments, in order to reduce thenumber of VCs from access terminations to edge forwarders,access terminations should support L2TP directly over AAL5or some other scaling mechanism. Flows with differentservice requirements shall be carried in different L2TPtunnels. There is no direct communication between Access Terminations. All traffic from an access termination whichflows into the CSI system must flow to an edge forwarder. A particular implementation of an access terminationmay allow traffic to make "hairpin turns," entering on oneservice interface and exiting immediately on another. Suchimplementations must take policy configuration intoconsideration. Configured policy may affect such traffic intwo ways: first, with regard to the legality of the trafficflow, and second, differentiation of service. Edge and core forwarders are responsible forestablishing persistent connections to those route serversdictated by their configuration. Route servers provide forwarders with configurationinformation related to service interfaces, includingbindings between service interfaces and particular realms. Route servers obtain reachability information from twosources: external routing entities (in peer networks andcustomer networks) and from edge and core forwarders. The route servers obtain external reachabilityinformation through use of standard routing protocols (BGP-4for external providers: RIPv2, OSPFv2 or BGP-4 for customernetworks).17

CA 02217275 1997-10-03 Edge forwarders send internal connectivity information(including information they obtain from access terminations)to the route servers using OSPFv2. Only topologicalconnectivity information is sent, not information aboutreachable destinations. Also, ad hoc shortcut VCCs are notadvertised. Finally access terminations do not appear inthis topological information. The route servers use the routing information fromexternal sources, topology information from the forwarders,and policy information from the management system, tocompute forwarding rules for each forwarder in the CSIsystem for which they are responsible. They then download this forwarding information to theforwarders. As a given forwarder may participate inmultiple realms, forwarding information includes at leastincoming service interface, PDU characteristics.such assource and destination addresses, output service interfaceand output queuing regime. Route servers are also responsible for computingmulticast forwarding rules for the forwarders, for usewithin and between realms. Multicast within bridged realmsis managed following the usual mechanisms for VLANs. Sinceunicast forwarding rules may already include informationsuch as incoming interface and source address, no newprotocol features are required to support distribution ofmulticast forwarding information to the forwarders. Multicast join and leave requests are sent from theforwarders to the route servers, which then distribute theappropriate forwarding rules in response. Finally, edge forwarders may query route servers toresolve from MAC or internetworking addresses to ATMaddresses in the case of VPN traffic (both bridged androuted) . Route servers establish connections to other routeservers according to configuration.18

CA 02217275 1997-10-03 Route servers use iBGP4 to communicate externalreachability information to each other. The BGP Next-Hopattribute is used to distribute the ATM address of theappropriate Edge Forwarder for external routes. This isrequired because the route servers may be physicallyseparate from the forwarders. Route servers use OSPFv2 to communicate internaltopology information among themselves. Only informationabout configured connections is distributed between routeservers. Information about dynamic, "shortcut" connectionsis never propagated. Route Servers may propagate NHRP and MAC-layer addressresolution queries to the next Route Server along the"default" path to the destination within that particularrealm. Given the forwarding tables delivered from the routeservers, the edge and core forwarders forward IP packets asrequired by "Router Requirements"; this includes generating ICMP messages as required. The Forwarders also respond to ICMP Echo Messages. Further, for packets received from acustomer network, the Edge Forwarders may verify that thesource address is valid for the network from which thepacket was received. Edge forwarders establish connections with each otherfor two reasons. First, if configured to do so for aparticular realm, and second, if a flow is detected and theedge forwarder considers a direct "shortcut" connection tobe appropriate. In the case of a configured connection,either edge forwarder may attempt to open the connection. Core forwarders only support the public Internet realm. Private realms (bridged or routed) always use directconnections between edge forwarders. Edge forwarders communicate with each other usingprotocols appropriate to the type of realm being supported. All packets or frames are encapsulated as required by the Fabric. Data transferred as part of a routed realm are19

CA 02217275 1997-10-03transferred as encapsulated internetworking level packetswhile data transferred as part of a bridged service aretransferred as MAC frames. Shortcut connections are direct SVC connections betweentwo Edge Forwarders, for flows which are high-volume orrequire specified Quality of Service (QoS) or othersegregated handling. Shortcuts are established by the edgeforwarders as a result of flow detection policies oradministrative control. The decision of when a flow hasbeen detected for which a shortcut connection is useful isan implementation issue. Within a single Realm and a single QoS, multipoint-to-point VCCs can be used to reduce the number of VCCs aforwarder must support. VCCs between two forwarders maycarry traffic from multiple realms. With appropriatesignaling and encapsulation a single VCC may carry trafficfor multiple realms as described previously. Core forwarders forward between each other as dictatedby configuration and by downloaded forwarding databases. Core forwarders do not exchange routing information, do notdetect flows, and do not create dynamic "shortcut" SVCs. With CSI, WAN internetworking service providers (e. g. ISP, telcos, IXCs (Inter Exchange ), large privateenterprises, etc.) can:1. Support as a service, multiple instances of the routed Virtual Private Network service over a variety of serviceinterfaces.2. Support as a service, multiple instances of the bridged Virtual Private Network service over a variety of serviceinterfaces.3. Support as a service, multiple instances of the public Internet connectivity services over a variety of serviceinterfaces. Note that in release Amethyst, only oneinstance of the Public Internet connectivity will besupported.

CA 02217275 1997-10-034. Capability to provide, support and manage all servicesabove (routed VPNs, bridged VPNs, and Public Internet) overa single ATM network infrastructure.5. Provide differentiated classes of service to customersfor all service types.6. Build sQaleable and high bandwidth internetworks.7. Coexist with other services offered by an ATM switch suchas Newbridge Network Corporation's 36170 (Frame Relay, Cell Relay, Circuit Emulation, etc.), as well as other ATMservices. Figure 1 shows a service view of a CSI system. The CSI network consists of the following four entities(see Figure 2):1. A connection oriented transport fabric infrastructureprovided by ATM switches.'2. Access terminations with separate or integrated edgeforwarding are provided by access forwarding devices.3. Internetworking functions (layer 3) are provided by the Route Servers, Edge Forwarders, and Core Forwarders. Coreforwarders will be optional. In Figure 2, allinternetworking layer devices are shaded. All dataforwarding devices are lightly shaded except the RSCP(Routing Service Control Point) is shaded differently(darker) to indicate that although the RSCP in involved inthe internetworking layer is has a different function. The RSCP does not participate in the forwarding of user data butinstead is responsible for running the system's routingprotocols and generating forwarding tables.4. The NMS consist of element, network, and servicemanagement systems and is responsible for managing allcomponents of the CSI system as listed above..The RSCP supports routing protocols, generates forwardingtables for the edge and core forwarders, and providesaddress resolution as required. For scaling andavailability reasons, multiple RSCPs can be deployed in asingle network.21

CA 02217275 1997-10-03 At Layer 3, the second most intelligent components inthe CSI architecture are the Edge Forwarders (EFs). EFsforward IP traffic over the ATM fabric via ATM SVCs, eithershort or long hold SVCs depending on the type of service. There are three types of traffic in a CSI Network: Routing traffic -this is routing information exchangedbetween various routers in the network. Control traffic - the RSCP stores control information (e. g.forwarding tables) for each of the Efs. Efs obtain thisinformation using ATM SVCs. Data traffic - bridged or routed PDUs being exchangedbetween Efs. For routed and bridged VPN traffic, the Edge Forwarderswill forward traffic to the Default Forwarder prior to ashort-cut SVC being setup. Once the Edge Forwarder has setup shortcut connections across the ATM transport fabric, itwill forward the traffic across the SVC and not the Default Forwarder. Access resolution is provided on demand by the RS. For Internet traffic in CSI, the Edge Forwarders alwaysforward the Internet traffic to an assigned CF or directlyto an egress EF. The CF will then relay user data trafficbased on its forwarding tables to either another CF, EF, orto an external interface (i.e. other ISP). Differentiatedservice for Internet traffic is possible and handled by the EFs. Edge Forwarders, with support from RSes via NHRP, willset up short-cut connections with appropriate QoS across the ATM transport fabric. The ATM fabric provides complete data pathinterconnection of the CSI components. The SVC andconnection oriented nature of ATM allows for cut throughconnections to be made on demand as required by theinternetworking layer and the sophisticated QoS/TM featuresof ATM are ideal for mapping prioritized customer traffic todifferent classes of service.22

CA 02217275 1997-10-03 The services supported include public Internetconnectivity, Routed VPNs and Bridged VPNs. Each serviceinterface must be configured for one service only although asingle access interface may support multiple serviceinterfaces. The following sections provide a brief summary of thegeneral functionality of each of the services. The Routed VPN service provides Ipv4 unicast andmulticast forwarding of packets received on serviceinterfaces. Each service interface supports one or more Ipv4 subnets; the subnet prefixes need only be unique withinthe VPN. Routing information is exchanged between the VPNand external equipment using standard routing protocols. The Routed VPN service will not forward traffic outsideof the VPN; however nothing precludes external gateways(e. g. routers, firewalls) from providing connectivitybetween VPNs or between a VPN and a Public Internet service. The Bridged VPN service provides IEEE 802.1(d)transparent bridging across a set of service interfaces,including an instance of the Spanning Tree Protocol. Each Bridged VPN can support a configurable set of protocols. Frames from a single service interface may be delivered tomultiple Bridged VPNs, however the set of protocolssupported by each VPN must be distinct. The Public Internet service provides Ipv4 unicast andmulticast forwarding of packets received of serviceinterfaces. Each service interface supports one or more Ipv4 subnets; the subnet prefixes must be globally unique. Routing information is exchanged between the Public Internet and external equipment using standard routingprotocols. Subnets within the Public Internet service canbe partitioned into multiple Autonomous Systems to allowmultiple (routing) policy domains within a single service. In the Example shown in Figure 3, the followinginterfaces and protocols are required to support public Internet services:23

CA 02217275 1997-10-03 Both RSCP_1 and RSCP-2 support Internet routing (eBGP; iBGPand OSPF). NHRP is run on both RSCP_1 and RSCP_2 (server-server) to support EF-to-EF shortcuts as described below. Both EF-1 and EF-2 support service interfaces to Internetcustomers. Full forwarding tables are downloaded from RSCP-1 to EF_1 and RSCP-2 to EF-2 via the Table Downloadprotocol . Shortcut data paths for higher CoS may be established for Internet services between EF 1 and EF_2 based onadministration control or configured policies in.the EFs. Aclient is run in the EFs to perform address resolutions. In the example of Figure 4, the following interfaces andprotocols are required to support Virtual Subnet services: EF-1 supports R-VPN A Service Interfaces using RIP as therouting protocol and VPN-B Service Interfaces with OSpF a.sthe routing protocol. EF-2 supports R-VPN A and R-VPN Crunning RIP and R-VPN B running OSPF. For VPN A, an instance of RIP will run between RSCP_1 and EF_1 VPN A attached devices and similarly between RSCP_2 and EF-2 VPN A attached devices. For full reachability, aninstant of RIP associated with VPN A operates between RSCP_1and RSCP 2. For VPN B, an instance of OSPF will run between RSCP_1 and EF-1 VPN B attached devices and an instant of OSPF between RSCP-2 and EF-2 VPN B attached devices. To fully manage VPN B across the two RSCPs, an instant of OSPF associatedwith VPN B is run between RSCP 1 and RSCP_2. For VPN C, an instance of RIP will run between RSCP_2 and EF-2 VPN C attached devices. Shortcut data paths are established between EF_1 and EF_2for all Unicast data traffic. A client is run in the EFs toperform address resolutions for shortcuts via the RSCPs. NHRP is run on both RSCP- 1 and RSCP-2 to support EF-to-EFshortcuts. EFs maintain a cache of most frequentconnections (to minimize EF-RSCP activity) and connections24

CA 02217275 1997-10-03are based on resilient SVCs (to minimize SVC set-up/tear-down). Directed broadcast and multicast traffic is forwarded to the RSCP's internal DF as shown in Figure 4. Using direct p-tomp connections the DF is responsible for forwarding thetraffic to the egress EFs. The internal DF is also used forproviding unicast forwarding for VPNs during the detectionand set-up time of short-cut connections (SVC)..Table 1 summarizes the performance of a CSI System. Unlessotherwise noted, the numbers shown are the minimum supportedperformance level under any condition. The latency numbers quoted for the PIPE should be valid forsituations where the total traffic being forwarded by the PIPE is less than the backplane bandwidth available to the PIPE, and when all traffic is treated at the same prioritylevel. Once the backplane bandwidth is exceeded, latencybecomes a function of PIPE output queue depth at any giveninstant. In the case of multiple COS, the latency numbersquoted should be valid for the highest-priority outputqueue. Note 1: Latency targets assume no congestion in thenetwork. To calculate the typical latency of a packettraversing a CSI network, simply sum the individuallatencies. For example, if a packet goes from a PIPE to a RIDGE, and traversing 5 switches in the process, the typicallatency will be P2 + P3 * (5 * P5

CA 02217275 1997-10-03 Criteria Phase 1 Phase 2

Target Target

P1 System Restart time (cold 15 minutes 15 minutes

start)

P2 Packet latency, PIPE (128 byte 40 us 40 us

packet, typical)

P3 Packet latency, Ridge (128 byte 100 ps 100 us

packet, typical)

P4 Packet latency, 36170 (typical) 35 ms 35 ms

(note 1)

P5 Shortcut path setup time, once 20 ms 20 ms

a flow has been detected

(typical)

P6 RSCP Integral Default Forwarder 10,000 pps 50,000 pps

unicast forwarding

Pn RSCP Integral Default Forwarder 1,000 pps 50,000 pps

multicast forwarding

P7 Yellow Ridge IP Unicast Packet 84,400 pps 84,400 pps

Forwarding Rate (packet size =

128 bytes)

P8 Orange Ridge IP Unicast Packet 84,400 pps 84,400 pps

Forwarding Rate (packet size =

128 bytes)

P9 Red Ridge IP Unicast Packet TBD (has TBD (has

Forwarding Rate (packet size not been not been=

128 bytes) characters characters

zed yet) zed yet)

P1 PIPE IP Unicast Forwarding Rate 118,000 118,000

0 (packet size = 128 bytes)

P1 PIPE IP Multicast Forwarding TBD TBD

1 Rate (packet size = 128 bytes)

P1 ICMP request handling on RSCP 10 per 10 per

2 second second

26

CA 02217275 1997-10-03P1 ARP requests handled per RSCP 500 500er

p per

second second

P1 IGMP requests handled per RSCP TBD TBDer

p per

second second

P1 OSPF updates absorbed per RSCP TBD TBDer

p per

second second

P1 BGP-4 updates absorbed per RSCP TBD per TBD per

second second

P1 Maximum Service Outage During 2 minutes 2 minutes

7 RSCP Activity Switch

P1 Maximum service outage during 20 seconds 20 seconds

8 PIPE activity switch

Pl RSCP Max Route Change 25k per 25k per

9 processing rate (routes) second second

P2 Forwarding Table Download Rate 1000 1000er

p per

0 (routes) second second

P2 Number of SVCs per second 100 100

1 originating from PIPE calls/seco calls/seco

nd nd

P2 Number of Address Resolution 800 per 800 per

2 Requests per PIPE second second

P2 Number of Address Resolution 50 per 50 per

3 Requests per Ridge second second

P2 Number of Address Resolution TBD TBD

4 Requests per Tigris

P2 Multicast Forwarding Rate 50,000 pps 50,000 pps

5 (Multicast Server)

P2 Service Unit Restart Time TBD TBD

6

P2 RSCP Restart Time TBD TBD

7

Table 1 CSI Performance Summary27

CA 02217275 1997-10-03 The Packet Internetworking Processing Engine (PIPE)provides a high-fanout Edge Forwarder as a 36170 UCS card. This engine is used to forward IP traffic delivered to thesystem on FR, PPP or ATM interfaces (see Figure 5) In thecase of RF or PPP traffic, the sessions must first traversea Frame Relay card in the 36170, however this card can be ina different shelf or system from the PIPE. The PIPE, based on the Extended Processing Engine Platform, provides the following instructions:a) automatic download of configuration information from the Configuration Server,b) initiation of SVCs as required to provide connectivity,c) termination of PPP sessions and FR connections,d) support for C10 independent forwarding contexts with alimit of C18 total forwarding entries per PIPE,e) obtains forwarding information from a Route server,f) packet classification and output queue selection insupport of system-level traffic management policing,g) transparent bridging in support of the Bridged VPNservice,h)IP unicast and multicast forwarding in support of the VPNand Public Internet services, andi) N+1 redundancy The ATM fabric provides interconnection of the CSIcomponents for both control and user-data traffic. As shownin Figure 2, each component of the CSI System is connectedto the ATM fabric; connectivity between components uses ATM Virtual Channel Connections (VCCs). Most inter-component SVCs are "resilient, long holdtime" SVCs, i.e. they are (re)established on componentrestart. On-demand SVCs are only used to provide shortcutsfor the VPN service. The "resilient" nature of the SVCsindicates that the component that originally initiated an SVC will persistently attempt to re-establish the SVC if itis ever cleared by the network. The interval between such28

CA 02217275 1997-10-03re-establishment attempts is subject to an exponentialbackoff. The generation of SVC setups by a component is rate-limited. There are three primary categories of inter-componentconnectivity; these are described in the sections thatfollow. The CSI System uses three set of VCCs for connectivityin the control plane:a) from an Edge Forwarder to the Configuration Server forconfiguration information downloadb) from the Edge Forwarder to the Route Server for basiccontrol function and on-demand address resolution for VPNservices andc) from the Route Server to all of the Edge Forwarders fordistribution of forwarding table information in support ofthe Public Internet service and basic control. A unicast SVC is established from the Edge Forwarder tothe RS/CS for registration and cache management. The RS/CSthen establishes a LAN Control SVC back to the Edge Forwarder over which configuration is downloaded withguaranteed delivery. The RS/CS also adds the Edge Forwarderas a leaf of P2MP SVCs, one for each VPN. Traffic descriptors for all types of connections,except the RS SVCs, are configurable. The non-serviceinterface connections are only configurable on a per-category per-realm basis. The defaults for all data connections (serviceinterfaces, short-cuts, default forwarder connections, etc.)are UBR, PIR = line_rate, MIR = 0 bps. The defaults for all control connections (to controlserver and route server from PIPE) are: nrtVBR, PIR =line-rate, SIR = TBD, MBS = 32 cells, CDVT = 250(s. Each Edge Forwarder obtains from the Configuration Server the ATM addresses of all Edge Forwarders involved in29

CA 02217275 1997-10-03 Public Internet traffic forwarding, or of a Core Forwarder,to which it maintains ATM connectivity. The Edge Forwarder maintains a VCC to each Edge Forwarder and/or Core Forwarder for each class of service;this VCC is established upon restart and/or(re)configuration. Each Edge Forwarder obtains from the Configuration Server the ATM address of at least one Default Forwarder towhich it maintains ATM connectivity. The Configurationinformation supplied by the Configuration Server resultsfrom the configuration of the system. The Edge Forwarder maintains a VCC to each Default Forwarder for each class of service; this VCC is establishedupon restart and/or (re)configuration. If the VCC isreleased by the network, the Edge Forwarder persistentlyattempts to re-establish the VCC. In addition to the base connectivity, an Edge Forwarderwill set up a new short-cut VCC or re-use an existingshortcut VCC when it detects a flow that requires a class ofservice for which there is no short-cut VCC. Short-cut VCCsare disestablished, using a distinct clearing cause, whenthe VCC has been idle for some period of time. Traffic Management is handled independently on a per-connection basis. There are two major types of connectionsin CSI, Service Interfaces and the set of SVCs comprisingthe CSI Core. Each connection needs the standard ATM Traffic Descriptor plus additional parameters comprising thepacket-level traffic information. Note that control androuting traffic gets priority over the data traffic. Two classes of service are provided by the CSI system. These are . Best Effort (no guarantees for either delay or packet loss) Better Effort Different levels of service can be offered to different Realms in a CSI system. The Realm differentiation isachieved by configuring different sets of ATM traffic

CA 02217275 1997-10-03parameters to apply to the ATM fabric CVCs for each Realm(See Figure 6). This differentiation applies only to EF-EF SVCs. There is no differentiation on the EF-RS and EF-CONS SVCs that are shared between the different Realms. In fact,there are two different SVCs per EF-EF pair, in order toallow intra-Realm service differentiation. The Vnet level service differentiation allowsprioritization of the traffic inside a given Realm. Each Vnet can be configured with the standard Best Effort Classof service or with the higher Better Effort COS. The trafficreceived from or transmitted to a Vnet configured with Better Effort gets the Better Effort Class of Service. (See Figure 7.) This same principle applies in the same way in a VPN Realm, to traffic routed or bridged between virtual subnetsor VLANS or in a Public Internet Realm to traffic routedbetween subnets. Effective, Better Effort COS is delivered when requiredby the use of separate transmission queues on the Service Interfaces of the EFs or separate EF-EF SVC over ATM fabricfor each COS and each Realm. The role of the Packet Classification is to determinethe COS for each packet in the CSI System. The Packet Classification is performed on each Packet Receive Interfaceof the Edge Forwarders. The RS does not perform any Packet Classification in this version. There are three different COS, from the highest to thelowest priority,a) Contol Traffic,b) User Data Better Effort,c) User Data Best Effort. In general, higher priority means lower delay and lowerpacket loss rate. The Control Traffic gets the highest priority in thesystem to provide immunity from data-plan congestion. The Control Traffic includes,31

CA 02217275 1997-10-03- ARM and CCP protocols- Routing protocols: RIP, OSPF, and BGP- Spanning Tree BPDUs. The Best vs. Better Classification of Data Trafficrequires explicit configuration at the Vnet level. Packets received from an SI and forwarded to the RSwith BME encapsulation can get 'Control Traffic' or Best Effort COS. Routing protocols and Spanning Tree Protocolgets Control Traffic COS. Every other User Data packetfalling in one of the exceptions cases gets the standard Best Effort. Packets received from an SI and forwarded directly toanother EF or internally to another SI of the receving EFget Best Effort or Better Effort COS. The general principleis that the COS is configured per VNet on the NMS. Each Vnet is configured with Best Effort or Better Effort COS. Each forwarded packet gets the higher COS configured on thesource and destination Vnets. COS (packet) - MAX.(COS(source VNet), COS(destination Vnet)) The term VNet refers to,- Virtual Subnet for routed traffic in VPN service.- VLAN for Bridged traffic in VPN,- Subnet for Public Internet case. In the Public Internet case, there is a one to onemapping between Subnet and SI, so that the 'per Subnet' COSconfiguration is equivalent to a 'Per SI' configuration. The following exceptions apply to IP routed traffic.- There is no COS differentiation between different Subnets behind a router. For traffic received from (resp.transmitted to) IP Hosts behind a router, the source (resp. Destination) VNet that is taken in account is the VNet wherethe router is admitted.- In case of IP multinetting on a given SI, thedifferent Virtual Subnets appearing on a single SI must be32

CA 02217275 1997-10-03configured with the same COS. This is because there aresome cases when the EF cannot determine the source VNet fortraffic received from hosts behind a router. Thisrestriction does not apply to different VLANs configured ona single SI.- In multiple RS architectures. Because COS parameterscannot be exchanged between RSs, when an EF transmits apacket to an EF that belongs to another RS domain, thispacket gets the COS of the Source Vnet. Packets received from another EF on a EF-EF SVC can get Best Effort of Better Effort COS. The packet classificationis similar to the Ingress EF case, but, as there is no Source Look-UP on Egress, the source VNet COS cannot betaken into account. It is replaced by the COS of the EF-EF where the packetis received. COS is associated with each EF-EF SVC. COS (packet)=MAX.(COS(Receiving EF-EF SVC), COS (destination VNet)) Every ARM and CCP protocol packet received on the EF- CONS SVC or ont he different EF-RS SVCs gets 'Control Traffic' COS. Notice that this is only an internal EFclassification as this type of packet is not sent to any SI. Packets received on the LAN Broadcast SVC from the RSwith BME encapsulation can get 'Control Traffic' or Best Effort COS for transmission to an SI. Routing protocols and Spanning Tree Protocol gets Control Traffic COS. Every User Data packet falling in one of the exceptions cases gets thestandard Best Effort. The table below summarizes the distribution of the User Data Packet Classification on the CSI components. NMS -Configuration of the COS for each VNet.33

CA 02217275 1997-10-03 RS - Distribution of the COS configuration to theForwarders.- Although the COS is configured at the VNetlevel on the NMS, it is also stored at the Cache Entry level in ARM messages and EF Forwardingtable. This ensures evolutivity to bettergranularity.- The RS gets from the VNM the COS configured witheach VNet of its own domain and uses this information I to determine the COS to assign to each Cache Entrydownloaded to the Efs.- A COS parameter is thus configured with each (MAC Station, Protocol Family) for Bridged traffic and toeach IP Host for IP routed traffic. IEF - Support of the COS parameter in Forwarding Table Entries.- Packet classification on Ingress and Egress Forwarding. Table 3-1 Packet Classification on the CSI components The role of this function is to offer to each packetthe level of priority requested as the result of the Packet Classification function. This function is implemented ineach Packet Output Queuing point in the Edge Forwarders. The Rs does not perform any kind of Packet Traffic Management. The EFs implements two separate output Queues for each SI. Packets classified in User Data Best Effort are placedin the Low Priority Queue. Packets classified in User Data Better Effort COS and Control Traffic COS are placed in the High Priority Queue. The High priority Queue needs to be completelyexhausted prior to the Low Priority Queue being processed.34

CA 02217275 1997-10-03 Within the High Priority Queue, lower packet loss rateis ensured to the Control Traffic COS through the use of twodifferent Discards threshold: one threshold for User Data Better Effort COS and one higher threshold for Control Traffic COS. A single Discard threshold is used for the Low priority Queue. In Summary, three Discard thresholds are defined foreach SI, one threshold per COS. A simple tail of Queuediscard is performed for each COS: the arriving packet isdiscarded if the threshold is reached. Three Discardstatistic counters are associated with each SI, one for each COS. In the Ridge case, each SI is a separate Ethernet portand there is no need for a Queue servicing algorithm acrossthe SIs. There are multiple separate transmission Queues on the Ridge ATM port, corresponding to different transmissionpriorities. Figure 8 is an illustration of the CSI managementmodel. As this figure shows customers can have one or morerealms. Each realm will have a type associated with it, oneof bridged VPN, routed VPN or public Internet access. Abridged realm can have one or more VLANs associated with it. A routed or public Internet access realm can have one ormore subnets or subnet groups associated with it. With eachsubnet group there is a set of subnets. In addition to the common features listed above, thefollowing features are provided for the Public Internetservice:i) The maximum number of prefixes (routes) per Public Internet Services is C4.ii) The CSI system uses External BGP (eBGP) to exchangerouting information with peers.iii) The CSI system can use iBGP, eBGP, OSPF or RIPv2 toexchange routing information with customers; alternatively

CA 02217275 1997-10-03it can use static information about what is reachable on thecustomer end of a service interface.iv) The CSI system uses Internal BGP (iBGP) to synchronizethe externally-obtained reachability across the Route Servers.v) The CSI system uses OSPF and/or static routes to managethe internal topology, i.e. the pre-defined reachabilitybetween Edge Forwarders, of the components that support the Public Internet Service.vi) The CSI system combines both the internal and externaltopology information while building the forwarding table.vii) Support for multiple autonomous systems within a single Public Internet service.viii) Unnumbered interfaces are supported. The 36170 Access Forwarder-the Packet Internetworking Processing Engine (or PIPE: is an element developed for the Carrier Scale Internetworking System (CSI). The following covers all functionality relating to: thetermination of PPP and FR connections for carrying networktraffic between the PIPE and the access interface cards; andthe internetworking forwarding services necessary to processthe network traffic to and from the peers on the PPP and FRservice interfaces. The hardware used to support the PIPE isa 36170 control card The PIPE is used within 36170 networks as an element of the Carrier Scale Internetworking System. The primary functionof the PIPE is to provide packet internetworking (layer 3+)service boundary for a wide range of low to medium speed36170 access interfaces The Packet Internetworking Processing Engine provides thefollowing primary functions: Fnl: UCS behavior Fn2: Virtual Connection support36

CA 02217275 1997-10-03 Fn3: Packet forwarding Fn4: PPP/ATM link termination Fn5: 802.1(d) Spanning Tree Protocol (STP) Fn6: Realm identity & network address assignment Fn7: "MPOA" client Fn8: Redundancy Within the CSI system the PIPE provides the routed (layer 3)and bridged (layer 2) forwarding services for variousphysical Access Interfaces across a range of 36170 packetand cell interface cards. Together the PIPE and itsassociated Access Interfaces create a high fan-out Edge Forwarder. The two network elements described in detailherein are the PIPE card and the Access Termination/Access Interfaces as provided by the various packet and cell cards. The CSI system is designed to give a network operatorfacilities to provide a range of internetworking services tocustomers. Figure 9 provides a simplified schematic diagramof the flows of traffic and control data to and from the PIPE. The two boxes at the left and right represent Customer Equipment (CE) that require internetworking connectivity. Typically these boxes are routers and/or bridges with someform of WAN interface which would be connected into the CSIsystem. In a simple application CE, might be a router with: an Ethernet interface servicing a customer LAN; and a Tlinterface providing the connection into the CSI system. The Access Termination (AT) on the 36170 would be a Tl port on a UFR card. There are two internetworking packetencapsulations which can be supported in this case. Thefirst is Frame Relay and the second is PPP. In both casesthe UFR card provides an Access Interface onto an ATM VCwhich connects to the PIPE across the 36170 ATM fabric. Andagain in both cases the PIPE provides all the necessary37

CA 02217275 1997-10-03functions to process the encapsulations and forward theinternetworking packets flowing to and from CE,. The Route Server (RS) provides the control information aboutforwarding so the PIPE can select the correct paths fordelivering packets. The Default Forwarder (DF) and Edge Forwarder (EF) elements together provide the internetworkingpath between the PIPE and CE2. The EF element could beeither another PIPE/AT pair a VIVID Ridge or a Tigris In the simple case packets will flow to and from CE, thougha path that goes from the PIPE up to the DF and on throughthe EF to CE2. When it has been determined eitherautomatically or through configuration that traffic between CE, and CE2 (or more correctly traffic between the PIPE andthe EF) is significant enough to require a more direct patha "short-cut" connection is established directly between the PIPE and EF. Once the "short-cut" is set up traffic between CE, and CE2 will flow over the "short-cut" bypassing the DF. In the "Public Internet" service case the connectionproviding the direct path between the PIPE and EF isconfigured administratively as a fixed link. This connectionis established within the system at initialization when thecomponents elements involved reach the full operationalstate and is maintained continuously. Figure 10 provides a more complete picture of a small buttypical system, showing the relationships between variousthe elements of the CSI application. There are a fewelements, the Configuration Server (CS) and the Core Forwarder (CF), added that complete the system along with afew PIPES, ATs and RSs illustrating the modular nature ofthe CSI system. The CS provides the PIPES and other elementsin the system with the details about connections and otherparameters necessary to bring the system to an operationalstate. The CF provides a function similar to the default38

CA 02217275 1997-10-03forwarder in networks where the traffic characteristicsrequires very high capacity default forwarding paths, e.g.services providing access to the Public Internet Figure 10 also illustrates how a small but typical CSIsystem could be used by a network operator to provide a mixof services to various customers while maintaining necessarypartitioning of control information and traffic load. The PIPE does not provide any external physical ports,consequently ports are not physical but are simplyimplementation abstractions. The EPEC card hosting the PIPE card can be reset throughsystem software as a maintenance function or modereconfiguration from NMTI. Software resets will tear downall active circuits and PPP connections immediately. The PIPE has its primary physical attachment to the networkfabric via the Newbridge ATM interface to the 36170backplane. Connections into the PIPE for various thefunctions detailed below are provided via PVCs, SVCs and SPVCs. Aggregates to the CSI core are supported on conventionalmultiprotocol VC terminations and are either staticallyassigned or dynamically bound SVCs using the "MPOA" clientfunction (Fn7). Frame Relay, PPP or ATM circuits providingnetwork layer encapsulation services are terminated on the PIPE as PVCs or SPVCs, using this same termination function,via the FRF.8 Inter-Working Unit on the various supported36170 frame relay interface cards. PPP packets aretransferred between the PIPE and the supported 36170interface cards using PVCs or SPVCs over a PPP/ATMtransparent HDLC encapsulation.39

- a.os as ou~y syppolfeG a au nn1 c;acds ne Celt Relay Frame Relay and PPP SPVCa1 0 ~ ~ o~'tllC ill RClay $VC IO~tIIiCdIIC IC 1~1511~C. Tabic 3-i OooaoctiOn'I~pes ~pocted by tfta P1PB Several SVC connections must be maintained continuously toprovide proper functioning of the CSI system. If one ofthese persistent connections is released, a call attempt ismade, again to the same destination address or, if more thanone destination address is available, the full set ofpossible destinations. The call attempts are made with anexponential backoff on failure with the initial time betweenattempts starting at a base interval (e. g. 1 second), after8 attempts it does not increase further (e.g. starting at 1second the final backoff interval will be just over a minute- 64 seconds) but the PIPE may continue to attempt the callindefinitely. The behavior if the 8th and final attemptfails is particular to the type of connection, some willpersist indefinitely and others will stop at the 8th attemptand raise an, alarm. The PIPE is responsible for determiningif any information preserved over the reconnect has changedduring the outage and reacting to these changes.. Transport services and applications above IP (and otherbest-effort layer 3 protocols) are sensitive to cell loss,and the upper-layer windowing protocols will tend to driveloads to the threshold of congestion for the network,however, early packet discard schemes are available whichreduce the effect of congestion in the ATM fabric andprovide improved feedback to properly behaving windowingmechanisms. A simple form of ATM traffic shaping is

CA 02217275 1997-10-03performed on the PIPE on a per-VC basis for traffic towardthe backplane. Traffic Policing is unnecessary for the PIPEas it is a trusted UNI device. The operator can define thetraffic contracts for specific categories of VCs initiatedfrom the PIPE. These categories are:1) Connections to the Configuration Servers;2j Connections to the Route Servers; and3) Short-cut connections to other Access Forwarders. The service interface traffic parameters can be any validselection as specified in the traffic management documentsreferred to above. It is intended that a network managementplatform support a profile mechanism for service interfaces.This reduces the amount of configuration required for eachservice interface. This is solely a management construct. Each service interface at the PIPE is controllableseparately. The PIPE implements services within ATM AAL5 encapsulationare compatible with the multiprotocol LLC/SNAPencapsulation. This provides IP/ATM, transparent bridgingover ATM and PPP/ATM functions. This are used to provide twofeatures within the CSI System. The first is to provide thetermination for connections provided on the Access Interfaces of the CSI system including:1. access over native ATM services;2. internetworking with external Frame Relay attachednetwork layer devices via the FRF.8 service IWU; and3. PPP attached devices as provided on the various 36170 FRinterface cards. The second is to provide the connectivity over short-cutsand statically configured VC paths across the core fabric toother networking elements in the CSI System.41

CA 02217275 1997-10-03 The basic network layer forwarding mechanism is common toboth bridged and routed networks. The model for thismechanism is illustrated in Fig. 11. The PIPE supports a fixed number of realms. The realms onthe PIPE are autonomous such that each realm has its own setof FIBs and no forwarding/routing information or other stateis shared between the realms. This allows the realms to havenon-unique address spaces if required and, more generally,isolates the realms from one another with respect to networkaddress assignments. For any particular Realm, one of the aggregate interfaceswill likely be configured as a connection to the defaultforwarder. Forwarding information about the other interfacesis either configured satirically through one of themanagement interfaces or via "MPOA" (Fn7). Finally, the PIBwill be updated automatically with the new link-localforwarding information when PPP, Bridged or IP/ATM and Bridged or IP/FR-ATM Service Interfaces are initiated orwhen Service Interface is disabled (either administrativelyor when the underlying connection closes). An essential element of packet forwarding on the PIPE is theprocess used for discarding traffic when queues reach anoverflow state. The PIPE provides a two discard disciplineswhich are applied to the output queues. The first is avariant of Random Early Discard and the second is simplehead-drop discard. The output queuing control is providedper service interface with a default setting of RED enabled. With RED turned on, as the output queue approaches anoverflow state, packets are discarded with a pseudo-randomselection of the packets to discard exponentially weighted42

CA 02217275 1997-10-03towards the earliest packets arriving. This is a simplifieddescription of RED. When RED is disabled, the transmit queues operate in asimple FIFO discipline with discards performed at the tailof the queue as it reaches an overflow state. In the extreme case where overflow occurs on input the PIPEcard discards on the tail of the input queue as new packetsarrive. In addition to the packet output queuing controls and ATMlevel traffic descriptors applied against a connection (anaccess service interface, a connection to a default/coreforwarder, or a short-cut path), the following additionalnetwork-level traffic management parameters. The class of service {COS) can be one of the followingvalues:1. Best Effort - There are no guarantees of packet loss nordelay in the PIPE;2. Better Effort - There is at least a 10-7 probability ofpacket loss within the PIPE and the packet delay is alsoless than for the "Best Effort" class of service; or3. Mixed Effort - elements and attributes of each packetdetermine whether a best effort or a better effort class ofservice is to be chosen. For packets flowing from ingress connection A to egressconnection B, connection B has one best effort queue and onebetter effort queue. If connection A is specified to have abest effort CoS, then the best effort queue is used. Ifconnection A has a better effort CoS, then the better effortqueue is used. If connection A has a mixed effort CoS, thenboth queues are used. Traffic is shaped out from theaggregate of these two queues at a rate which matches the ATM traffic descriptor. The packets are emitted at a packetrate approximating the bit rates specified in the traffic43

CA 02217275 1997-10-03descriptors. When a packet is allowed to be transmittedaccording to the shaping rule table segmentation into cellsoccurs, the cells are sent back-to-back across the backplaneof the 36170. The MBS (when shaping to the SIR) or CDVT(when shaping to the PIR) values must be chosen to ensurethat the traffic contract is maintained using this form ofshaping. For each service category, the traffic is shaped accordingto the following traffic descriptor values: UBR . PIR (peak)nrtVBR SIR (sustained)rtVBR SIR (sustained)~R PIR Table 3-2 Packet Transmit Rate for different Service Categories For routed and bridged VPNs which the "MPOA" client lookupcache management function, the packet forwarding functionapplies a flow detection mechanism on source-destinationsets which are not currently in the cache. This mechanismmonitors the traffic for the new source-destination pair andidentifies the traffic as a flow when the traffic reaches arate of at least M packets in N seconds. The default valuesare 4 packets in 10 seconds. Only when a flow is detecteddoes the "MPOA" client establish a short-cut path. Ip forwarding is the internetworking layer applied to eachpacket received on an IP routed service interface. Thisincludes applying error checking rules and policy filtering,determining what to do with the packet in terms of the next-hop to its ultimate destination and finally queuing thepacket for output or possible local delivery. Although Routed VPNs and Internet Access appear on surface to besignificantly different features, when examining the PIPE IP44

CA 02217275 2004-07-23forwarding function those differences are mostly superficial. Routed VPNs tend to have a smaller set of address prefixeswhich change over time driven by supporting flow detectionand consequently triggering "short-cuts". Internet Accesstypically requires a very large set of addresses prefixeswhich will change over time mostly based on updates providedby the route server via the Full Table Download function andthe set of active interfaces will be relatively constant. The IP forwarding function on the PIPE provides for supportfor processing IP packets which are forwarded in and out ofservice interfaces which are operating using the LLCJSNAPbridged encapsulation. This function provides the necessary ARP capabilities to bind and maintain MAC addresses for the IP hosts on the remote LAN segment. This function is :notsupported for PPP bridged interfaces. The IP forwarding mechanism (IFM) works by using variouslayer 3 information within each packet (along withinformation about which interface the packet arrived on) andswitches packet traffic between the various PPP and IP/ATMlinks. The following is a simplified description of the IFM with theterminology aligned to CSI:1) the forwarder receives the IP packet (plus other details)from the link layer;2) the forwarder validates the IP header;3) the forwarder performs processing of most of any IPoptions;

CA 02217275 2004-07-234) the forwarder examines the destination IP address in the IP header against the FIB and assuming it satisfies basicrequirements for forwarding;5) the address of next hop for the packet (and the correctoutput interface) is determined;45a

CA 02217275 1997-10-036) the source address is tested for validity and anyadministrative constraints are applied;7) the forwarder decrements TTL and then tests for expire;8) the forwarder performs processing of any IP options whichcould not be completed in step 3;9) the forwarder performs any necessary IP fragmentation;10) the forwarder determines the link layer address of thenext hop for the packet; and11) finally the forwarder queues the packet for delivery onthe interface out to the next hop. For directed diagnostic an IP forwarding table dump isprovided to verify the operational state of the FIBs The PIPE supports bridge forwarding within designatedbridged VPNs. Bridging is available between serviceinterfaces which belong to the same VLAN and protocolfamily(s). Bridge forwarding on the PIPE can becharacterized as half bridging since it is connected toanother bridge via a point-to-point link. Diagnostics on the PIPE for Bridge Forwarding include abridge table dump and view of the current stateconfiguration of spanning tree. This forwarding table dumpand STP view matches the elements contained in the Bridge MIB. The bridging function on the PIPE card is determined by theconfiguration information sent to it by the RS. Thisconfiguration includes the definition of VPNs, VLANs and theservices they offer. A service interface or set of serviceinterfaces can only be bound to a VLAN or set of VLANs. With this information configured on the PIPE the bridgefunction only forwards traffic between service interfaces inthe same VLAN. In this way, traffic is forwarded to only asubset of service interfaces.46

CA 02217275 1997-10-03 The Bridging Algorithm used for the PIPE follows thestandard defined in IEEE 802.1. The following functions areperformed by the PIPE as part of its bridging role1 ) Bridge packets from one Bridging interface to another;2) Learning and Cache Management; and3) Filter packets to prevent loops (informed by Fn7, the802.1 (d) Spanning Tree Protocol). The first function is the basic relay of packets from oneend station to another on a different interface. The basicprocess is:1) Bridged Packets are received by the PIPE;2) The MAC address and service interface association of thesender are recorded in the PIPE's cache;3) The Destination MAC contained in the packet is examinedand matched to an entry in the PIPE's existing cache;4) If an entry exists (the cache contains permanent entriesfor the reserved MAC broadcast and multicast addresses), thepacket is passed out the associated output interface (forthe broadcast/multicast entries this is the DF which thenprovides the correct flooding);5) If an entry does not exist, a message is sent to the"MPOA" client function (Fn8) which will attempt to get aresolution for Destination MAC;6) If the Destination MAC is resolved, the packet is passedout the associated service interface (in same manner as step4); otherwise7) The packet is discarded. The second function is MAC address learning and cachemanagement. When packets are received by the PIPE, a recordof the source MAC address and its related service interfaceis kept in a cache. This cache allows the PIPE to easilylook up the relationship between the source and destinationidentified in the packet. If the configuration for thesource and destination match, the packet is forwarded to the47

CA 02217275 1997-10-03appropriate service interface. However, if the configurationdoes not match, the packet is discarded or checked forspecial handling, in the case of the RS, which is requiredto communicate with all stations. The size of the cache, however, is not infinite so an agingmechanism is required to maintain a set of recently usedrecords for source and destination to service interface/VLANmappings. The aging function determines whether a cacheentry has been used recently. If the entry has been used itis refreshed and maintained in the cache. If has not beenused, the entry is deleted to make room for new cacheentries. The PIPE card will generate billing records every fifteenminutes using the same format as using by 36170 SVC records. Information will be provided in the records for transmittedpackets, received packets, transmitted bytes, receivedbytes. Records will also be created when the PVC isdisconnected. This will provide the data for the finalportion of a fifteen minute interval for which the PVC wasconnected. The Point-to-Point Protocol (PPP) provides an.interoperablemethod for communicating multi-protocol network datagrams. The PIPE provides for the PPP termination of standard bit-synchronous PPP over HDLC connections into the 36170 CSIsystem by internetworking with the transparent HDLC frameforwarding function on 36170 FR cards which has an optionalmode for providing an internetworking service which supportsconversion of PPP packets to and from the PPP over AALSencapsulation. This function is intended to support the"leased-line" mode of operation for permanent IP services,for example T1/E1 ISP customer "feeds LCP options are set by the network management entitiesthrough the service configuration for a particular realm andloaded through the "MPOA" Configuration.48

CA 02217275 1997-10-03 The PIPE provides for static configuration of theauthentication control information including the sharedsecrets used within the protocol. These are configurable viathe network management entities and is normally loadedthrough the "MPOA" Configuration Server The IP Control Protocol is used on fully established andauthenticated PPP links to negotiate the IP address at eachend of the PPP link and to negotiate VJ TCP/IP headercompression The peer's IP address can be assigned ordiscovered and verified with this protocol, dependent on howthe link has been configured to negotiate this option. Bydefault, address assignment for the link peer and link localassignment from the peer are both disabled on the PIPE. Van Jacobson TCP/IP header compression, an option that canbe negotiated in IPCP can reduce a standard 40 byte TCP/IPheader to variable size header between 3 and 16 bytes formost of the TCP packets transmitted over a PPP connection. VJ header compression and decompression is a functionsupported on the PIPE. By default, it is disabled but it canbe enabled on individual PPP service interfaces through themanagement interfaces. The use of VJ header compression doeshave an impact on performance and other resources in the PIPE. In addition, depending on the nature of trafficflowing across the link and the number of "VJ slots"assigned to it may provide little or no compression. The IETF standard PPP network control protocol (NCP) forbridging, the Bridge Control Protocol is used on fullyestablished and authenticated PPP links terminating on the PIPE to negotiate the operation of transparent bridging of802.3 LAN traffic. Until PPP has reached the Network Layerand BCP is fully negotiated, bridged data packets will bediscarded by the PIPE.49

CA 02217275 1997-10-03 Transparent bridging is accomplished by negotiating thefollowing BCP options:10 The CSI system provides no support for the LAN- Identification option and, because there is no requirement,there is no support for options related to source-routebridging or proprietary Spanning Tree Protocols. The Internetworking Realms on the PIPE provide anabstraction for organizing related service interfaces: thelower layer PPP and FR access ATM VC interfaces andassociated aggregate interfaces into the core networks; andthe addressing information of external network servicesrequired for normal operation The PIPE supports a fixed number of independent realms and afixed number of service interfaces. These interfaces aredistributed across realms ensuring that each realm will havea fixed number of interfaces. For example, a PIPE supportinga maximum of 500 interfaces and 5 realms might be configuredto handle 3 routed IP realms, 1 with 200 interfaces and 2with 50 interfaces, and 2 bridged realms each with 100interfaces. If a connection is attempted which exceeds theconfigured interface limit for a particular realm, theconnection is refused. The PIPE supports a few methods of administrativelyassigning network addresses and, where required, netmasksand forwarding prefixes (static routes), to the various FR, PPP and ATM link interfaces. In addition to the various link

CA 02217275 1997-10-03interfaces the PIPE provides an abstracted "null" interfacewhich can be used in conjunction with the forwardingfunction to provide for discard (or black-holing) of variouscategories of traffic. The appropriate methods aredetermined when a new interface is configured on the PIPEdepending on the specific type of Access Interface/Service Interface/Core Interface required. Once an interface isdefined, but before the configuration applied and it isactivated, the interface is linked to the appropriate realm,ensuring that the traffic associated with that interfacewill only be forwarded within the correct network addressspaces. Typically, PPP links will either be configured using the"numbered-numbered" model, where the PPP peers are the onlytwo nodes in a distinct point-to-point subnet, or the"unnumbered-unnumbered" model, where the peers have no IPaddresses for the PPP interfaces on the PPP link. The linksimply provides a bi-directional path between two distinctsubnets. The PPP links may also be configured using the"numbered-unnumbered" model which means that only theinterface address of the remote peer from PIPE is set forthe link. For the "unnumbered-unnumbered" and the "numbered-unnumbered" models the PIPE supports the use of the "localroute server" address to help manage control of these typesof connections. The local address assignments for ATM and Frame Relayservice interfaces are provided from the Configuration Server/Route Server based on the PIPE providing theinformation required to determine which Service Interface/Access Interface is currently serviced by the PIPE. Inverse ARP (InARP) is the standard method, in older, non- MPOA environments, for network devices to discover the IP51

CA 02217275 1997-10-03address of a peer device associated with a particularvirtual circuit (e.g. ATM or Frame Relay). This allows forverification and dynamic configuration of address mappingsrather than relying on static configuration of the ARPtable. The PIPE can be configured to use InARP to discoverthe IP addresses of the network neighbors connected to theaggregate interfaces.. Some existing implementations of IPover NBMA media have no support for Inverse ARP. To allowinteroperation, controls for disabling/enabling InARP andfor static ARP table administration are provided via the PIPE management entities. Service interfaces established andconfigured using MPOA do not support InARP. Address assignments for the "MPOA" ATM VC core interfacesare provided from the Configuration Server and Route Server The common controls for all Service Interfaces are Enabled/Disabled and Reset. In addition being able todisable, enable or reset the interface the operator canexamine the state of the interface and view variousinterface statistics. There are many statistics andconfiguration details which are common to all interfaces. The PIPE provides all of the relevant values defined in thecurrent IF MIB and also provides a number of useful summarystatistics through various management interfaces. Inaddition diagnostics and controls have specific behaviorsrelated to the various types of interfaces. Disable and Enable are used to temporarily block an interface from beingused. For PPP interfaces, Reset causes the PPP state machines togracefully tear down the link and return to the initialstate. This control is intended for forcing the controlleddisconnection of specific PPP connections. For FR and ATMservice interfaces, Reset causes the connect to redo anydefined initial exchange. For both PPP and FR/ATM service52

CA 02217275 1997-10-03interfaces a reset causes all queues for the interface to beflushed. Information relevant to tracing the PPP connection state iscollected and made available through various managementinterfaces. Tracing of CHAP does not expose securityspecific details of the authentication protocol. The tracefacility recognizes all assigned numbers for these PPPprotocols listed in current IANA assigned numbers, includingprotocols and options not supported on the PIPE. Information related to tracking the state of FR and ATM Service Interfaces and the ATM Core Interfaces is collectedand made available through various management interfaces. The PIPE provides a few control interfaces to aid in networkand system diagnostics and maintenance:1 ) Echo packet generation - provided to verify the IPprotocol connectivity between PIPE and other networkentities. The ICMP echo request is basis of the commonlyused PING command. The PIPE can generate such requests andforward them to other network entities. The PIPE alsoreplies to ICMP echo requests.2) Network path tracing - provided for tracing the route IPtraffic takes to reach a particular destination hostinterface. This function is equivalent with the"traceroute" command in UNIX. The mechanism involveslaunching a specially sequenced stream of UDP probe packetsand then listening for ICMP time-exceeded (TTL-expired)responses from the forwarding devices along the path. Theaddresses of intermediate devices that responded as IPpackets traversed the path are displayed along with anestimate of the delay based on the round trip for eachtransaction.53

CA 02217275 1997-10-03 The PIPE supports Spanning Tree Protocol as defined in IEEE802.1(d. The Spanning Tree implementation allows for loop-free topology such that a path exists between every pair of LANs in the network. STP is negotiated on a per VPN basis, enabling each VPN tohave a separate STP instance. STP does not apply to the Internet Access case. Extensions to the standards are based on those definedbelow:1) If the PIPE becomes unregistered all established SVCs aretorn down, such that bridging traffic and STP BPDUs are notforwarded.2) A configuration BPDU is recognized and ignored if it isreceived by its originator on the same port from which itwas sent.3) BPDU received over ATM from anything other than thethrough the "MPOA" client are ignored by the PIPE. (The'~MPOA" client will drop any BPDU that is not received froma registered device.)4) If the Bridge Aggregate interface for the particularrealm goes into a blocking state, the destination cache mustbe flushed to ensure that no entries point to the [nowblocked] interface. In addition, when the Bridge Aggregatereturns to the forwarding state, the source cache for therealm is flushed so that it can resynchronized with the MPOAclient5) Negotiation for the version of STP supported betweenregistered devices is limited to protocol 1 (IEEE802.1 (d))or NULL in the case where an external bridge does notsupport STP. STP on the PIPE affects the state of one or more of itsinterfaces. Current STP states of the Service Interfaces areviewable via the NMTI management interface. The STP54

CA 02217275 1997-10-03standard, described in IEEE 802.1 (d), provides for thefollowing configurable parameters: Priority used to determine thecost of

usin this brid a asroot.

Maz Age amount of time t~eforea

on message should be

a o ime trine een co>ifiguiation

BPDU advertisin rootstatus

rw lay ength of time spendm

intermediate statebefore

changing from blockedto

forward state

A.guig ime length of time since~ a root sent a

confi vn

These parameters are configurable through a managementinterface and accessible via SNMP. Default STP parametersare used in the absence of user configured values. The PIPE communicates with the Configuration Server toresolve which Route Server is controlling each of the Realmssupported by the PIPE. The PIPE communicates with each Route Server to register and verify new service interfaces, todeclare new locally attached hosts and subnets, and toresolve remote bridged or network-layer addresses to ATMaddresses. After being initialized from the Control Card, the PIPEfirst connects to the Configuration Server. It uses theaddress configured for the Configuration Server whichdefaults to a well-known AESA anycast address. The trafficparameters are configurable. The PIPE will be downloaded with information about each VPN/IA/Realm within the system. This includes the ATMaddresses of the primary and backup route servers. As the information changes, the Configuration Server keepseach of the PIPEs updated. The connection to the Configuration Server is maintainedcontinuously using a persistent SVC. If the connection fails

CA 02217275 1997-10-03or is released the persistent SVC mechanism will attempt areconnect (with an initial period of 1 second) to the sameanycast address and will continue to attempt the callindefinitely. Because of the nature of the anycast addressmechanism when the new connection is eventually establishedit may even be to a different Configuration Server. Theexact same procedures as explained for Initialization aboveapply to the new connection. The Configuration Servers, in an N+1 redundant system ofdatabases, distribute to each of the PIPES the informationnecessary for establishing the LAN data and LAN controlconnections required for all the realms each of the PIPESare serving After receiving the ATM Addresses of all of the Route Servers, the PIPE establishes a LAN Data connection to eachof the Route Servers for each of the VPN/IA/Realms that ithas Service Interfaces for. The traffic parameters areconfigurable on a per-VPN/LA/Realm basis. The connectiondoes not use any assured delivery capabilities. When a Route Server detects a LAN Data connection havingbeen established, the Route Server starts the registrationmechanism by sending the Register Server message (i.e.supplies the features it supports) to the PIPE. The PIPEresponds with a Register Client message (supplies thefeatures the PIPE supports) back to the Route Server. The Route Server then sends a Register Response message whichindicates a successful registration. Following successful registration, the PIPE establishes a LAN Control to the Route Server. This connection usesdifferent traffic parameters that are again configurable ona per-Realm basis, and using the Q.SAAL assured deliverybearer mechanism. This connection is used provide variouselements of configuration information.56

CA 02217275 1997-10-03 Also following successful registration, the Route Serverwill add the newly registered PIPE to a LAN Broadcast(point-to-multipoint) connection. The Route Server uses thisconnection for broadcast packets, multicast packets and fortable downloads. The LAN Data, LAN Control and LAN Broadcast connections aremaintained continuously as long as Service Interfaces existfor the VPN. If a LAN Data or LAN Control connection isreleased the persistent SVC mechanism (with an initialperiod of 1 second) will attempt a reconnect using thecurrent Route Server (e.g. primary) address. If thepersistent SVC mechanism fails on the final exponentialbackoff to the current address, the PIPE clears any LAN Data, LAN Control and LAN Broadcast connections to thefailed Route Server. An attempt is then made to set up the LAN Data connection to the other Route Server (e. g. backup)address, thereby restarting the registration process. Since the PIPE cannot control it's addition to the LAN Broadcast connection, it cannot engage in the persistent SVCmechanism for this connection. Instead, the PIPE relies onthe current (e.g. primary) Route Server to perform thepersistent SVC mechanism. On detection of the loss of the LAN Broadcast connection the PIPE will however begin a timerof duration equivalent to, but slightly longer than thetotal duration of the persistent SVC mechanism's retryperiod. This timer is canceled should the errant LAN Broadcast connection be re-established. On expiry of thistimer, the PIPE will clear any LAN Data or LAN Controlconnections to the failed Route Server. The PIPE will thenattempt to set up the LAN Data connection to the other Route Server (e.g. backup) address, thereby restarting theregistration process.57

CA 02217275 1997-10-03 If the persistent SVC mechanism fails on the finalexponential backoff to both Route Servers for a VPN/IA/Realm, then the PIPE informs the Configuration Serverthat that particular set of Route Servers is unreachable anda major alarm is raised on the 36170. After ~1.3 times the Route Server cold-start time andincluding a random factor of +0.15 RS cold start time ofoutage of the LAN Data connection, the operation of this Realm ceases. All cache entries are removed. This limits thepotential of creating forwarding loops and unintended black-holes within the network. The PIPE supports bridged VLANs for any protocol family. Bridged VLANs separate traffic of different protocols andlimit the protocols that can be used to communicate fromspecific hosts. They can carry all network-layer protocolfamilies or any of the following:1) IP2) IPX (Internet Packet eXchange)3) XNS (Xerox Network System)4) SNA (Systems Network Architecture)5) NetBIOS (Network Basic Input/output System)6) CLNP7) Banyan VINES (Virtual Network System)8) AppleTalk9 ) DECnet10) LAT (Local Area Transport) VLAN membership is configured from the route server. Thereis no local support for configuring bridged VLANs.. The PIPE supports routed virtual subnets for the IP protocolonly. Membership in a virtual subnet determines PPP IPaddress assignment, broadcast groups, etc.58

CA 02217275 1997-10-03 Membership in virtual subnets is configured from the routeserver. There is no local support. Service Interfaces can belong to multiple VLANs and Virtual Subnets. A Service Interface can belong to no more than one VLAN which supports the same protocol. A Service Interfacecan belong to many virtual subnets provided there is nooverlap in assigned subnet IP addresses. Except in the case of the Internet Access service, all other Realms (the VPNs) use the VIVID cache management protocolswith the route server to learn and provide information about MAC and Network-layer addresses. The Internet Access service uses Table Download (TD) inaddition to the Cache Management protocols described above. The Table Download process begins with the Route Serverproviding the minimal set of cached Network-layer (IP)addresses required to allow the PIPE to begin processing. Following the initial table phase, the Table Downloadprocess continues with the final table phase. During thisphase, the Route Server provides all remaining applicable Network-layer (IP) addresses. At any time following the initial table download, tablemaintenance (adds & deletes) is performed using the VIVIDcache management protocols described above. Table Download may occur under any of three conditions:1) Network cold start.2) Partial network restart / cold start (multiple PIPEs).3) Single PIPE restart / reconfig. In fact, Table Download may begin under a single PIPErestart condition (3) which may later turn out to be apartial network restart condition (2). Table Download willutilize the unicast LAN Control SVC during the initial table59

CA 02217275 1997-10-03phase of Table Download. In order to provide good systemstart up performance without impacting the system when onlya single PIPE is restarting, Table Download will utilizeunicast (LAN Control) or multicast (LAN Broadcast)facilities depending on the number of PIPES in the finaltable phase of Table Download. Table Download will also becapable of switching from using unicast (LAN Control) tomulticast (LAN Broadcast) facilities as PIPES enter thefinal table phase of Table Download. Paths are constructed between Forwarders using SVCs set upusing the ATM Address in the path table, the configuredtraffic descriptor for paths in the particular Realm, and B-HLI parameters indicating the type of device (the PIPE)that is establishing the connection. Parallel paths between Forwarders are disallowed except where difference levels of CoS are required. Two types of paths may be created between Forwarders (PIPEs)1) aged; and2) permanent. The determination that a path is aged or permanent is madebased on aging information provided by the Route Server whena path table entry (egress IP to ATM address mapping) isdownloaded to the PIPE. The Route Server provides path tableentries either as part of initial table download or on anexception basis. Aged paths are set up on demand, whenever a datagram isreceived whose Network-layer (IP) address is mapped to an ATM Address where no SVC currently exists. These paths areaged out when there has been no data flowing over theconnection for at configurable period of time. Age time isconfigurable on a per path basis. The default age time is 30seconds. Aging out causes the SVC for the path to bereleased. When new data arrives for the path, the SVC is re-

CA 02217275 1997-10-03established. While the path is being established or re-established, data is forwarded to the Default Forwarder Permanent paths are set up as soon as a path table entry isprovided to the PIPE by the Route Server and are maintainedusing the persistent SVC mechanisms. Should the persistent SVC for a path fail on its final exponential backoff, the Route Server will be informed so that routing informationcan be re-calculated. The PIPE will continue periodicattempts to re-establish the persistent SVC for the path. When the persistent SVC for the path is re-established, the Route Server is again notified so that that routinginformation can again be re-calculated Paths may be viewed from a management interface. The pathsthe connections take through the network can only be derivedmanually. There is no call trace support for theseconnections. N+M PIPE Redundancy is a form of warm redundancy that canoptionally be enabled for the PIPE. The redundancy appliesonly within an individual 36170 and applies to the whole36170. Separate independent N+M partitions are notavailable. N PIPE cards are providing service to the N PIPE instancesthat have Service Interfaces programmed. M PIPE cards,referred to as the spare cards, are sitting around idlewaiting for one of the N PIPE cards to fail. A PIPE Instance is a floating set of functionality which canbe placed on any PIPE Card within the 36170. It isidentified by an 8-bit number. Service Interfaces areassigned to a PIPE Instance through management interaction. All CSI configuration, application maintenance, andstatistics are performed by identifying the PIPE Instance,61

CA 02217275 1997-10-03not the PIPE slotId. The slotId is only used for card-specific maintenance, such as resetting, softwaredownloading, etc. Everywhere else the PIPE instance isreferred to w s a PIPE. The operation and the alarms that result from the operationof this redundancy scheme will be similar. The FS describesthe dynamic nature of the assignment of PIPE Instances forservice on PIPE Cards. It is to be noted that lower PIPE Instance numbers receive higher priority for assignment to a PIPE Card although the priority is non-preemptive. When a non-spare (active) PIPE running applications becomesunavailable, all applications on the card are moved to aspare PIPE if it is available. Since PIPE N+M redundancy isnot hot redundancy, the service interfaces and otherapplications are reset to the initial state. All currentshort-cuts and connections to the RS/CS are released. One ofthe formerly spare PIPE becomes active. This PIPE cardstarts setting up connections to the Configuration Serverand the appropriate Route Servers and creates the necessaryshort-cuts. Functions that are provided by the PIPE can be configuredand managed through an external network management entitysuch as NMTI. The major new managed entities provided on the PIPE are PIPE Instances, Realms, and Service Interfaces. Most of theconfiguration is downloaded to the PIPE by either the Configuration Server (CS) or the Route Server (RS), but canbe displayed using NMTI. The configuration elements comingfrom the CS/RS arrive via a few different paths and methodsdependent on the specific nature of the element. Todistinguish these differences the following tags areprovided:62

CA 02217275 1997-10-03~ G - provided from the initial global configuration server~ X - generated as the result of an exception exchange withthe RS~ W - direct write via basic configuration path~ D - derived from other configured element and the activestate of the PIPE The 36170 control card has no actual knowledge of Realms,and does not have any NVM storage allocated for theconfiguration of those entities. The following table summarizes parameters that areconfigured and/or displayed for an entire 36170.

The following table summarizes parameters that can be configured and/or displayed for a Realm.35 Table 5-2.1 Internetworking Realm Configuration Table63

CA 02217275 1997-10-03The following table summarizes parameters that can be configured and/or displayed for aReam configured on a specific PIPE instance (some of these parameters may be configuredsystem-wide from the 46020, but they are downloaded to each PIPE card individually).,. n

CA 02217275 1997-10-03. Figure 5.a shows the top-level NMTI Softkey Legend for"CONFIG". The corresponding sub-menus are shown in thesections that follow. Most of the information shown on these menus is downloadedfrom either the Config Server or the Route Server. In fact,the only items that can be modified via NMTI are the itemsthat appear under the CONFIG SERVER softkey. The NMTI menus go several levels deep on many screens. Inmany cases, the complete chain of commands will not fit on asingle command line. In those circumstances, the word"CONFIG/MAINT/STATS" remains on the command line, but wordsimmediately to the right of the word "CONFIG/MAINT/STATS"may be deleted to make room for the most recent softkeydisplays. In those instances, the deleted text is restoredas the "CANCEL" softkey is used to back out of the lowermenus.existing Functionality2 0 New Functionalitykeystroke input» Toggle keys are defined twice at same level (default value is on top) Fl: CONFIG F7: MORE Fi: INTERNETWORK Fl: PIPE_INSTANCE F2: CONFIG_SERVER Figure S.a General Softkey Infrastructure.

CA 02217275 1997-10-03 Virtually all the configuration information the PIPE cardreceives from the Configuration Server and the Route Servercan be viewed from the PIPE INSTANCE NMTI screens. The PIPE-INSTANCE menu tree is shown in Figure 12. The SERVICE-I/F, BRIDGING, and IP ROUTING subtrees are expandedupon later in this document. Note that the "BRIDGING"softkey is only available if the Realm has been configuredas a VPN. If the Realm has been configured to provide Internet Access, the error message "Bridging Only Supported For VPN's" will be displayed when the "BRIDGING" softkey isselected. It should be noted that the "SERVICE-~/F ' softkey directlybelow <pipeInstance> is simply a shortcut to "PIPE INSTANCE<pipeId> REALM <realmId> SERVICE I/F' While particular embodiments of the invention have beendescribed and illustrated it will be apparent to one skilledin the art that numerous changes can be made to the basicconcept. It is to be understood that such changes will fallwithin the full scope of the invention as defined by theappended claims.75