This document is not restricted to specific software and hardware
versions.

The information presented in this document was created from devices in
a specific lab environment. All of the devices used in this document started
with a cleared (default) configuration. If you work in a live network, ensure
that you understand the potential impact of any command before you use
it.

The NPE-300 uses parity checking in shared memory (SDRAM), PCI Bus, and
the CPU's external interface to protect the system from malfunctioning by bit
errors. Parity checking is capable of detecting a single bit error by using a
simple method; adding one check bit per eight bits of data. If it detects a bit
error when passing the data between hardware components, the system discards
the erroneous data. Single bit errors at any location in the diagram above
cause the router to reset.

The NPE-400 uses Single Bit Error Correction and Multi-bit Error
Detection ECC (Error Code Correction) for shared memory (SDRAM). To increase
system availability in the NPE-400, ECC corrects single bit errors in SDRAM, to
allow the system to operate normally without resetting and without down time.
For more information on how ECC enhances system availability, refer to the
Increasing
Network Availability page.

A multi-bit error in SDRAM causes the router to reset with a cache
error exception or bus error. The rest of the memory and buses in the system
use single bit parity detection. Single bit errors at 1 and 3 in the diagram
above cause the router to reset.

Several of the parity checking devices on the C7200/NPE router can
report data with bad parity for any read or write operation. Here is a
description of the various error messages reported on a C7200/NPE system:

As with all computer and networking devices, the NPE is susceptible to
the rare occurrence of parity errors in processor memory. Parity errors may
cause the system to reset and can be a transient Single Event Upset (SEU or
soft error) or can occur multiple times (often referred to as hard errors) due
to damaged hardware. For more information on SEUs, refer to the
Increasing
Network Availability page. A CPU parity error is reported if the CPU
detects a parity error when accessing any of the processor's caches (L1, L2, or
if fitted, L3).

The NPE has an R7K processor with non-blocking cache. Non-blocking
cache means when it executes an instruction to load data into a register and
this data is not in the L1 cache, the CPU loads the data from a lower order
cache or from SDRAM data. The CPU does not block execution of further
instructions unless there is another cache miss or another instruction depends
upon the data being loaded. This can greatly speed up the processor and improve
performance, but can also lead to parity errors being imprecise. An imprecise
parity error is when the CPU reads information without blocking, and later
determines there was a parity error in the associated cache line. The R7K
processor is unable to tell us specifically which instruction was being
executed when the cache line was being loaded, and that is the reason we call
it an imprecise parity error.

Even if systems use Error Code Correction (ECC), it is still possible
to see an occasional parity error when more than a single error has occurred in
the 64 bits of data due to a hard error in the cache.

A parity error occurs when a signal bit value is changed from its
original value (0 or 1) to the opposite value. This error can occur either due
to a soft or hard parity error.

Soft parity errors occur because of an external influence on the
memory of the device, which changes the bit value at the current level. This
type of problem is transient and does not reoccur. Hard parity errors occur
when the bit value is changed by the memory itself because of damage to the
memory. In that case, the problem occurs every time that area of memory is
used, which means that the problem can repeat multiple times within a couple
days to a week.

The following course of action is recommended when you encounter such
errors:

Monitor the affected hardware to see if the same problem happens
again. If it does not, then it was a transient Single Event Upset (SEU) and you
do not need to take any action.

In the unlikely event that the problem does reoccur, the
cache L3 bypass/disable command is an option that
may help reduce the impact of the issue. This command is only available on the
following platforms:

7200 with processor engine NPE-300, NPE-400, or NSE-1

7400 with processor enginer
NSE-1

Because the NPE-300 does not support ECC memory, this feature is
especially important to increase system availability and handle these parity
errors without service interruption. This resolves many soft parity errors. The
caveat is that there is a slight performance hit to the system when L3 cache is
disabled. The performance degradation is anywhere between 1% and 10% depending
upon the system configuration. The syntax for using this command is dependent
on the Cisco IOS software version.

The cache L3 disable command can be
found in Cisco IOS Software Releases 12.3(5a) and later. It will also be
available in 12.1(22)E. In these versions, L3 cache is disabled by default, so
no action is needed to take advantage of this feature. L3 cache can be
reenabled with the command no cache L3 disable.

The cache L3 bypass command can be
found in Cisco IOS Software Releases 12.2(6)S, 12.2(6)B, 12.2(8)BC1b,
12.0(20)SP, 12.2(6)PB, 12.2(2)DD2, 12.0(20)ST3, 12.0(21)S, 12.1(11)EC,
12.2(7)T, 12.1(13), and 12.2(7) or later, and 12.1(11)E through 12.1(21)E. This
command is disabled by default.

To enable L3 cache bypass, enter the following from configuration
mode:

Router(config)#cache L3 bypass

To disable L3 cache bypass, enter the following from configuration
mode:

Router(config)#no cache L3 bypass

The new cache setting does not take effect until the router is
reloaded.

When the router boots up, system information is displayed,
including information about the L3 cache. This is because the startup-config
file has not yet been processed by the system. After the startup-config file is
processed, the L3 cache is bypassed if the cache L3
bypass command is in the configuration.

To verify the L3 cache setting, you can issue the show
version command. If the L3 cache is bypassed, there is no
reference to the L3 cache in the show version
output.

Another feature that helps increase system availability is the
Cache Error Recovery Function (CERF). When this feature is enabled (this is the
default in the latest Cisco IOS software releases, but as of February 2004,
only for NPE-300 and NPE-400), the Cisco IOS software makes an attempt to
resolve the parity error and keep the processor from crashing. This feature
resolves around 75% of certain types of soft parity errors. By invoking this
command, the system sees less than 5% performance degradation.

CERF for the NPE-300 can be found in Cisco IOS Software Releases
12.1(15), 12.1(12)EC, 12.0(22)S, 12.2(10)S, 12.2(10)T, 12.2(10), 12.2(2)XB4,
12.2(11)BC1b, and 12.1(5)XM8 or later.

CERF for the NPE-400 can be found in 12.3(3)B, 12.2(14)S3,
12.1(20)E, 12.1(19)E1, 12.3(1a), 12.2(13)T5, 12.2(18)S, 12.3(2)T, 12.2(18),
12.3(3), and 12.3(1)B1 or later.

CERF for the NPE-300 requires hardware revision 4.1 or higher. In
order to identify the hardware version of your NPE-300, use the
show c7200 command.

If the suggestions above do not resolve the issue, then replacing the
NPE may help in cases of repeated occurrences of parity errors since hard
parity errors are due to damaged hardware. Hardware replacements are identical
to the original NPE. Replacing the NPE does not guarantee that no further
parity errors will occur since Single Event Upsets (SEUs) are inherent in any
computer equipment with memory.