Avoiding Problems Through Regular Monitoring

Monitoring system resources allows you to detect potential problems before they happen, thus avoiding outages. The following are show the advantages of regular monitoring:

•In a real-life example, customers installed new line cards. After the line cards were in operation for a few years, lack of memory on those line cards caused major outages in some cases. Monitoring memory usage would have identified a memory issue and avoided an outage.

•Regular monitoring establishes a baseline for a normal system load. You can use this information as a basis for comparison when you upgrade hardware or software—to see if the upgrade has affected resource usage.

Control Plane Overview

The following sections contain a high-level overview of the control plane:

Cisco ASR 1000 Series Routers have a distributed control plane architecture. A separate control processor is embedded on each major component in the control plane, as shown in Figure 5-1:

•Route Processor (RP)

•Forwarding Engine Control Processor (FECP)

•I/O Control Processor (IOCP)

The RP manages and maintains the control plane using a dedicated Gigabit Ethernet out-of-band channel (EOBC). The internal EOBC is used to continuously exchange system state information among the different major components. For example, in the event of a failure condition, a switchover event occurs and the standby RP and ESP are immediately ready to assume the data forwarding functions or the control plane functions for the failed component.

The inter-integrated circuit (I2C) monitors the health of hardware components. The Enhanced SerDes Interconnect (ESI) is a set of serial links that are the data path links on the midplane connecting the RP, SIPs, and standby ESPs to the active ESP.

Figure 5-1 Cisco ASR 1000 Series Routers Control Plane Architecture

The control plane processors perform the following functions:

RP

•Runs the router control plane (Cisco IOS), including processing network control packets, computing routes, and setting up connections.

•Provides direct CPU access to the forwarding engine subsystem—the Cisco QuantumFlowProcessor (QFP) subsystem—that is the forwarding processor chipset and also resides on the ESP.

•Manages the forwarding engine subsystem and its connection to I/O.

•Manages the forwarding processor chipset.

IOCP

•Provides direct CPU access to SPAs installed in a SIP.

•Manages the SPAs.

•Handles SPA online insertion and removal (OIR) events.

•Runs SPA drivers that initialize and configure SPAs.

Cisco IOS XE Software Architecture

The control plane processors run Cisco IOS XE software, which is an operating system that consists of a Linux-based kernel and a common set of operating system-level utility programs. It is a distributed software architecture that moves many operating system responsibilities out of the IOS process.

In this architecture, IOS runs as one of many Linux processes while allowing other Linux processes to share responsibility for running the router. IOS runs as a user process on the RP. Hardware-specific components have been removed from the IOS process and are handled by separate middleware processes in Cisco IOS XE software. If a hardware-specific issue is discovered, the middleware process can be modified without touching the IOS process.

Figure 5-2 shows the main components of the Cisco IOS XE software architecture. This modular architecture increases network resiliency by distributing operating responsibility among separate processes. The architecture also allows for better allocation of memory so the router can run more efficiently.

All of the Cisco IOS XE software modules run in their own protective memory spaces, which facilitates fault containment. Any software outages of an individual software module are localized to that particular module. All other software processes continue to operate. For example, for each SPA, a separate driver process is executed on the SIP, even if multiple SPAs of the same type are present. Because each SPA driver runs in its own protective memory, failure or upgrade of an individual driver is localized to the affected SPA.

Figure 5-2 Cisco IOS XE Software Architecture

Using the Linux architecture, Cisco IOS XE provides the following benefits:

•The ability to integrate multi-core (multiple CPUs on a single piece of silicon) processors.

•The IOS process operates as a virtual machine under the RP Linux kernel. Upon bootup, the RP Linux kernel allocates 50 percent of available memory to IOS processes as a one-time event. For systems that have a single IOS process, IOS is allocated approximately 45 percent of total RP memory. For redundant IOS process systems, each IOS process is allocated approximately 20 percent of total RP memory.

IOS Process Resources

For information about memory and CPU utilization from within the IOS process, use the show memory command and the show process cpu command. Note that these commands provide a representation of memory and CPU utilization from the perspective of the IOS process only; they do not include information for resources on the entire route processor. For example, show memory on an RP2 with 8 GB of RAM running a single IOS process shows the following memory usage:

For the dual-core RP2, the show process cpu command reports a single IOS CPU utilization average using both processors:

Router# show process cpu

CPU utilization for five seconds: 0%/0%; one minute: 0%; five minutes: 0%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

1 583 48054 12 0.00% 0.00% 0.00% 0 Chunk Manager

2 991 176805 5 0.00% 0.00% 0.00% 0 Load Meter

3 0 2 0 0.00% 0.00% 0.00% 0 IFCOM Msg Hdlr

4 0 11 0 0.00% 0.00% 0.00% 0 Retransmission o

5 0 3 0 0.00% 0.00% 0.00% 0 IPC ISSU Dispatc

6 230385 119697 1924 0.00% 0.01% 0.00% 0 Check heaps

7 49 28 1750 0.00% 0.00% 0.00% 0 Pool Manager

8 0 2 0 0.00% 0.00% 0.00% 0 Timers

9 17268 644656 26 0.00% 0.00% 0.00% 0 ARP Input

10 197 922201 0 0.00% 0.00% 0.00% 0 ARP Background

11 0 2 0 0.00% 0.00% 0.00% 0 ATM Idle Timer

12 0 1 0 0.00% 0.00% 0.00% 0 ATM ASYNC PROC

13 0 1 0 0.00% 0.00% 0.00% 0 AAA_SERVER_DEADT

14 0 1 0 0.00% 0.00% 0.00% 0 Policy Manager

15 0 2 0 0.00% 0.00% 0.00% 0 DDR Timers

16 1 15 66 0.00% 0.00% 0.00% 0 Entity MIB API

17 13 1195 10 0.00% 0.00% 0.00% 0 EEM ED Syslog

18 93 46 2021 0.00% 0.00% 0.00% 0 PrstVbl

19 0 1 0 0.00% 0.00% 0.00% 0 RO Notify Timers

Overall Control Plane Resources

For information about control plane memory and CPU utilization on each control processor, use the show platform software status control-processor brief command (summary view) or the show platform software status control-processor command (detailed view).

All control processors should show a status of Healthy. Other possible status values are Warning and Critical. Warning indicates that the router is operational but that the operating level should be reviewed. Critical implies that the router is near failure.

If you see a status of Warning or Critical, take the following actions:

•Reduce static and dynamic loads on the system by reducing the number of elements in the configuration or by limiting the capacity for dynamic services.

•Reduce the number of routes and adjacencies, limit the number of ACLs and other rules, reduce the number of VLANs, and so on.

The following sections describe the fields in show platform software status control-processor command output.

Load Average

Load average represents the process queue or process contention for CPU resources. For example, on a single-core processor, an instantaneous load of 7 would mean that seven processes are ready to run, one of which is currently running. On a dual-core processor, a load of 7 would represent seven processes are ready to run, two of which are currently running.

Memory Utilization

Memory utilization is represented by the following fields:

•Total—Total line card memory

•Used—Consumed memory

•Free—Available memory

•Committed—Virtual memory committed to processes

CPU Utilization

CPU utilization is an indication of the percentage of time the CPU is busy and is represented by the following fields:

•CPU—The allocated processor

•User—Non-Linux kernel processes

•System —Linux kernel process

•Nice—Low priority processes

•Idle—Percentage of time the CPU was inactive

•IRQ—Interrupts

•SIRQ—System Interrupts

•IOwait—Percentage of time CPU was waiting for I/O

The following are examples of the show platform software status control-processor command.

Router# show platform software status control-processor brief

Load Average

Slot Status 1-Min 5-Min 15-Min

RP0 Healthy 0.25 0.30 0.44

RP1 Healthy 0.31 0.19 0.12

ESP0 Healthy 0.01 0.05 0.02

ESP1 Healthy 0.03 0.05 0.01

SIP1 Healthy 0.15 0.07 0.01

SIP2 Healthy 0.03 0.03 0.00

Memory (kB)

Slot Status Total Used (Pct) Free (Pct) Committed (Pct)

RP0 Healthy 3722408 2514836 (60%) 1207572 (29%) 1891176 (45%)

RP1 Healthy 3722408 2547488 (61%) 1174920 (28%) 1889976 (45%)

ESP0 Healthy 2025468 1432088 (68%) 593380 (28%) 3136912 (149%)

ESP1 Healthy 2025468 1377980 (65%) 647488 (30%) 3084412 (147%)

SIP1 Healthy 480388 293084 (55%) 187304 (35%) 148532 (28%)

SIP2 Healthy 480388 273992 (52%) 206396 (39%) 93188 (17%)

CPU Utilization

Slot CPU User System Nice Idle IRQ SIRQ IOwait

RP0 0 30.12 1.69 0.00 67.63 0.13 0.41 0.00

RP1 0 21.98 1.13 0.00 76.54 0.04 0.12 0.16

ESP0 0 13.37 4.77 0.00 81.58 0.07 0.19 0.00

ESP1 0 5.76 3.56 0.00 90.58 0.03 0.05 0.00

SIP1 0 3.79 0.13 0.00 96.04 0.00 0.02 0.00

SIP2 0 3.50 0.12 0.00 96.34 0.00 0.02 0.00

Router# show platform software status control-processor

RP0: online, statistics updated 10 seconds ago

Load Average: healthy

1-Min: 0.30, status: healthy, under 5.00

5-Min: 0.31, status: healthy, under 5.00

15-Min: 0.47, status: healthy, under 5.00

Memory (kb): healthy

Total: 3722408

Used: 2514776 (60%), status: healthy, under 90%

Free: 1207632 (29%), status: healthy, over 10%

Committed: 1891176 (45%), status: healthy, under 90%

Per-core Statistics

CPU0: CPU Utilization (percentage of time spent)

User: 30.12, System: 1.69, Nice: 0.00, Idle: 67.63

IRQ: 0.13, SIRQ: 0.41, IOwait: 0.00

RP1: online, statistics updated 5 seconds ago

Load Average: healthy

1-Min: 0.14, status: healthy, under 5.00

5-Min: 0.11, status: healthy, under 5.00

15-Min: 0.09, status: healthy, under 5.00

Memory (kb): healthy

Total: 3722408

Used: 2547488 (61%), status: healthy, under 90%

Free: 1174920 (28%), status: healthy, over 10%

Committed: 1889976 (45%), status: healthy, under 90%

Per-core Statistics

CPU0: CPU Utilization (percentage of time spent)

User: 21.98, System: 1.13, Nice: 0.00, Idle: 76.54

IRQ: 0.04, SIRQ: 0.12, IOwait: 0.16

ESP0: online, statistics updated 5 seconds ago

Load Average: healthy

1-Min: 0.06, status: healthy, under 5.00

5-Min: 0.09, status: healthy, under 5.00

15-Min: 0.03, status: healthy, under 5.00

Memory (kb): healthy

Total: 2025468

Used: 1432088 (68%), status: healthy, under 90%

Free: 593380 (28%), status: healthy, over 10%

Committed: 3136912 (149%), status: healthy, under 300%

Per-core Statistics

CPU0: CPU Utilization (percentage of time spent)

User: 13.37, System: 4.77, Nice: 0.00, Idle: 81.58

IRQ: 0.07, SIRQ: 0.19, IOwait: 0.00

ESP1: online, statistics updated 5 seconds ago

Load Average: healthy

1-Min: 0.22, status: healthy, under 5.00

5-Min: 0.08, status: healthy, under 5.00

15-Min: 0.02, status: healthy, under 5.00

Memory (kb): healthy

Total: 2025468

Used: 1377980 (65%), status: healthy, under 90%

Free: 647488 (30%), status: healthy, over 10%

Committed: 3084412 (147%), status: healthy, under 300%

Per-core Statistics

CPU0: CPU Utilization (percentage of time spent)

User: 5.76, System: 3.56, Nice: 0.00, Idle: 90.58

IRQ: 0.03, SIRQ: 0.05, IOwait: 0.00

SIP1: online, statistics updated 6 seconds ago

Load Average: healthy

1-Min: 0.05, status: healthy, under 5.00

5-Min: 0.06, status: healthy, under 5.00

15-Min: 0.00, status: healthy, under 5.00

Memory (kb): healthy

Total: 480388

Used: 293084 (55%), status: healthy, under 90%

Free: 187304 (35%), status: healthy, over 10%

Committed: 148532 (28%), status: healthy, under 90%

Per-core Statistics

CPU0: CPU Utilization (percentage of time spent)

User: 3.79, System: 0.13, Nice: 0.00, Idle: 96.04

IRQ: 0.00, SIRQ: 0.02, IOwait: 0.00

SIP2: online, statistics updated 8 seconds ago

Load Average: healthy

1-Min: 0.03, status: healthy, under 5.00

5-Min: 0.03, status: healthy, under 5.00

15-Min: 0.00, status: healthy, under 5.00

Memory (kb): healthy

Total: 480388

Used: 273992 (52%), status: healthy, under 90%

Free: 206396 (39%), status: healthy, over 10%

Committed: 93188 (17%), status: healthy, under 90%

Per-core Statistics

CPU0: CPU Utilization (percentage of time spent)

User: 3.50, System: 0.12, Nice: 0.00, Idle: 96.34

IRQ: 0.00, SIRQ: 0.02, IOwait: 0.00

For More Information

For more information about the topics discussed in this chapter, see the following documents: