The information in this document is based on these software and
hardware versions:

All Cisco 12000 Series Internet Routers, including the 12008, 12012,
12016, 12404, 12406, 12410, and the 12416.

All Cisco IOS® Software versions that support the Cisco 12000 Series
Internet Router.

The information in this document was created from the devices in a
specific lab environment. All of the devices used in this document started with
a cleared (default) configuration. If your network is live, make sure that you
understand the potential impact of any command.

Provides context information about the recent crash(es). This
is often the most useful command for troubleshooting line card
crashes.

core dump

A core dump of a line card is the full content of its memory at
the time of the crash. This data is normally not needed for an initial
troubleshooting. It may be required later if the problem turns out to be a new
software bug. In that case, refer to
Configuring
a Core Dump on a GSR Line Card.

If you have the output of a show
tech-support (from enable mode) command from your Cisco device,
you can use to display potential issues and fixes. In order to use , you must be a
registered
customer, be logged in, and have JavaScript enabled.

You can use Output Interpreter to display potential issues
and fixes. In order to use Output Interpreter, you must be a
registered
customer, be logged in, and have JavaScript enabled.

The Cisco 12000 Series supports the diag
[slot#] command for testing the different
board components. This command is useful for troubleshooting hardware-related
crashes, and to identify the faulty board.

The verbose option causes the router to
display the list of tests as they are being performed. Otherwise, it simply
displays a "PASSED" or "FAILURE" message.

Note: Performing this diagnostic stops all activities of the line card for
the duration of the tests (usually around five minutes).

Starting with Cisco IOS Software Release 12.0(22)S, Cisco has unbundled
the Cisco 12000 Series Internet Router field diagnostics line card image from
the Cisco IOS software image. In earlier versions, diagnostics could be
launched from the command line and the imbedded image would be launched. In
order to accommodate customers with 20 MB Flash memory cards, line card field
diagnostics are now stored and maintained as a separate image that must be
available on a Flash memory card or a Trivial File Transfer Protocol (TFTP)
boot server before the field diagnostics commands can be used. Router processor
and switch fabric field diagnostics continue to be bundled and need not be
launched from a separate image. You can find more information at
Field
Diagnostics for the Cisco 12000 Series Internet Router.

Depending on the error encountered, the slot might or might not be
automatically reloaded. If it is not, it might be in a stuck or inconsistent
state (check with the show diag [slot
#] command) until manually reloaded. This is normal.
In order to manually reload the card, use the hw-module slot
[slot#] reload command.

You can identify cache parity exceptions by the SIG=20
in the show context [slot #]
output.

If you have the output of a show
tech-support (from enable mode) command from your Cisco device,
you can use to display potential issues and fixes. In order to use , you must be a
registered
customer, be logged in, and have JavaScript enabled.

You can use Output Interpreter to display potential issues
and fixes. In order to use Output Interpreter, you must be a
registered
customer, be logged in, and have JavaScript enabled.

There are two different kinds of parity errors:

Soft parity errors—These occur when an energy level within the chip
(for example, a one or a zero) changes. In case of a soft parity error, there
is no need to swap the board or any of the components.

Hard parity errors— These occur when there is a chip or board failure
that causes data to be corrupted. In this case, you should re-seat or replace
the affected component, usually a memory chip swap or a board swap. There is a
hard parity error when multiple parity errors are seen at the same address.
There are more complicated cases which are harder to identify but, in general,
if more than one parity error is seen in a particular memory region in a
relatively short period of time (several weeks to months), this can be
considered a hard parity error.

Studies have shown that soft parity errors are 10 to 100 times more
frequent than hard parity errors.

In order to troubleshoot these errors, find a maintenance window to run
the diag command for that slot.

If the diagnosis results in a failure, replace the line card.

If there is no failure, it is likely to be a soft parity error, and
the line card does not have to be replaced (unless it crashes a second time
with parity error after a short period of time).

You can identify bus error exceptions by the SIG=10 in
the show context [slot #]
output.

If you have the output of a show
tech-support (from enable mode) command from your Cisco device,
you can use to display potential issues and fixes. In order to use , you must be a
registered
customer, be logged in, and have JavaScript enabled.

You can use Output Interpreter to display potential issues
and fixes. In order to use Output Interpreter, you must be a
registered
customer, be logged in, and have JavaScript enabled.

This type of crash is normally software-related, but if for some reason
(for example, it is a brand new card, or the crashes start after a power
outage) you think the problem could be hardware-related, run the
diag command for that slot.

Note: Some software bugs have been known to cause the
diag command to report errors, even though there is
no problem with the hardware. If a card has already been replaced, but still
fails at the same test in the diagnostic, you might be affected by this issue.
In that case, treat the crash as a software problem.

Upgrading to the latest version of your Cisco IOS software release
train eliminates all fixed bugs causing line card bus errors. If the crash is
still present after the upgrade, collect the relevant information (see
Gather Information about the Crash), along
with a show tech-support, and any information that
you think might be useful (such as recent topology change, or a new feature
recently implemented) and contact your Cisco support representative.

You can identify software-forced crashes by the SIG=23
in the show context [slot #]
output. Despite the name, these crashes are not always software-related.

If you have the output of a show
tech-support (from enable mode) command from your Cisco device,
you can use to display potential issues and fixes. In order to use , you must be a
registered
customer, be logged in, and have JavaScript enabled.

You can use Output Interpreter to display potential issues
and fixes. In order to use Output Interpreter, you must be a
registered
customer, be logged in, and have JavaScript enabled.

The most common reason for software-forced crashes is the "Fabric Ping
Timeout". During normal router operation, the Route Processor (RP) continually
pings the line cards. If a line card doesn't answer, the route processor
decides to reset it. This results in a software-forced crash (SIG=23) of the
affected line card, and you should see these errors in the router's logs:

There are software bugs in Inter Process Communication (IPC) or the
line card is running out of IPC buffers. Most of the time these software-forced
reloads are caused by software bugs.

Upgrading to the latest version of your Cisco IOS software release
train eliminates all fixed bugs causing fabric ping timeouts. If the crash is
still present after the upgrade, collect the relevant information (see
Getting Information about the Crash), along
with a show tech-support, a show ipc
status, and any information that you think may be useful (such as
recent topology change, or a new feature recently implemented) and contact your
Cisco support representative.

Hardware failure—If the card has been running fine for a long time
and no recent topology, software, or feature changes have taken place, or if
the problems started after a move or a power outage, defective hardware may be
the cause. Run the diag command on the affected line
card. Replace the line card, if faulty. If multiple line cards are affected or
the diag is fine, replace the fabric.

TXECCERR/RXECCERR error occurs when RxFIFO or TxFIFO unrecoverable ECC
error interrupt occurs in MAC more than the threshold value within the time
interval. Unrecoverable ECC errors can not be corrected by the ECC logic. When
an unrecoverable error occurs during RxFIFO read, the packet to which the data
belongs is marked with EOP/Abort on the SPI4 receive interface and is discarded
by upper layers.

This is due to the hardware and is corrected once we reload the
SIP/SPA. The permanent solution is to replace the SIP/SPA in order to avoid the
errors.

Other crash types are, by far, less common than the two mentioned
above. In most cases, the diag command should
indicate whether the card needs to be replaced or not. If the card passes the
diagnostic test correctly, consider upgrading the software.

If you still need assistance after following the
troubleshooting steps above and want to
open
a service request
(registered customers only)
with the Cisco TAC, be sure to include the following
information:

Troubleshooting performed before opening the service request.

show technical-support output (in
enable mode if possible).

show log output or console
captures, if available.

execute-on slot [slot #]
show tech for the slot which experienced the line card
crash.

Attach the collected data to your service request in
non-zipped, plain text format (.txt). You can attach information to your
service request by uploading it using the
TAC Service Request tool
(registered customers only)
. If you cannot access the
Service Request tool, you can send the information in an email attachment to
attach@cisco.com with your service
request number in the subject line of your message.

Note: Do not manually reload or power-cycle the router before
collecting the above information unless required to troubleshoot a line card
crash on the Cisco 12000 Series Internet Router, as this can cause important
information to be lost that is needed for determining the root cause of the
problem.