Recovering the Administrator Password

If you forget the administrator password for accessing a Cisco MDS 9000 Family switch, you can recover the password using a local console connection. For the latest instructions on password recovery, go to http://www.cisco.com/warp/public/474/ and click on "MDS 9000 Series Multilayer Directors and Fabric Switches" under Storage Networking Routers.

Troubleshooting System Restarts

This section describes the different types of system crashes and how to respond to each type. It includes the following topics:

Overview

•Unrecoverable—A process is not restartable or it has restarted more than the max restart times within a fixed period of time (seconds) and will not be restarted again.

•System Hung/Crashed—No communications of any kind is possible with box.

Most system restarts generate a Call Home event, but the condition causing a restart may become so severe that a Call Home event is not generated. Be sure that you configure the Call Home feature properly, follow up on any initial messages regarding system restarts, and fix the problem before it becomes so severe. For information about configuring Call Home, refer to the Cisco MDS 9000 Family Configuration Guide or the Cisco MDS 9000 Family Fabric Manager User Guide.

Working with Recoverable Restarts

Every process restart generates a Syslog message and a Call Home event. Even if the event is not service affecting you should identify and resolve the condition immediately because future occurrences could cause service interruption.

To respond to a recoverable system restart, follow these steps:

Step 1 Enter the following command to check the Syslog file to see which process restarted and why it restarted.

switch# sh log logfile | include error

For information about the meaning of each message, refer to the Cisco MDS 9000 Family System Messages Guide

Step 4 Enter the following command to show detailed information about a specific process that has restarted:

switch# show processes log pid 898

The system output looks like the following:

Service: idehsd

Description: ide hotswap handler Daemon

Started at Mon Sep 16 14:56:04 2002 (390923 us)

Stopped at Thu Sep 19 14:18:42 2002 (639239 us)

Uptime: 2 days 23 hours 22 minutes 22 seconds

Start type: SRV_OPTION_RESTART_STATELESS (23)

Death reason: SYSMGR_DEATH_REASON_FAILURE_SIGTERM (3)

Exit code: signal 15 (no core)

CWD: /var/sysmgr/work

Virtual Memory:

CODE 08048000 - 0804D660

DATA 0804E660 - 0804E824

BRK 0804E9A0 - 08050000

STACK 7FFFFD10

Register Set:

EBX 00000003 ECX 0804E994 EDX 00000008

ESI 00000005 EDI 7FFFFC9C EBP 7FFFFCAC

EAX 00000008 XDS 0000002B XES 0000002B

EAX 00000003 (orig) EIP 2ABF5EF4 XCS 00000023

EFL 00000246 ESP 7FFFFC5C XSS 0000002B

Stack: 128 bytes. ESP 7FFFFC5C, TOP 7FFFFD10

0x7FFFFC5C: 0804F990 0804C416 00000003 0804E994 ................

0x7FFFFC6C: 00000008 0804BF95 2AC451E0 2AAC24A4 .........Q.*.$.*

0x7FFFFC7C: 7FFFFD14 2AC2C581 0804E6BC 7FFFFCA8 .......*........

0x7FFFFC8C: 7FFFFC94 00000003 00000001 00000003 ................

0x7FFFFC9C: 00000001 00000000 00000068 00000000 ........h.......

0x7FFFFCAC: 7FFFFCE8 2AB4F819 00000001 7FFFFD14 .......*........

0x7FFFFCBC: 7FFFFD1C 0804C470 00000000 7FFFFCE8 ....p...........

0x7FFFFCCC: 2AB4F7E9 2AAC1F00 00000001 08048A2C ...*...*....,...

PID: 898

SAP: 0

UUID: 0

switch#

Step 5 Enter the following command to determine if the restart recently occurred:

switch# sh sys uptime

Start Time: Fri Sep 13 12:38:39 2002

Up Time: 0 days, 1 hours, 16 minutes, 22 seconds

To determine if the restart is repetitive or a one-time occurrence, compare the length of time that the system has been up with the timestamp of each restart.

Step 6 Enter the following command to view the core files:

switch# show cores

The system output looks like the following:

Module-num Process-name PID Core-create-time

---------- ------------ --- ----------------

5 fspf 1524 Jan 9 03:11

6 fcc 919 Jan 9 03:09

8 acltcam 285 Jan 9 03:09

8 fib 283 Jan 9 03:08

This output shows all the cores presently available for upload from the active supervisor. The column entitled module-num shows the slot# on which the core was generated. In the example shown above, an fspf core was generated on the active supervisor module in slot 5. An fcc core was generated on the standby supervisory module in slot 6. Core dumps generated on the line card in slot 8 include acltcam and fib.

To copy the FSPF core dump in this example to a TFTP server with the IP address 1.1.1.1, enter the following command:

switch# copy core://5/1524 tftp::/1.1.1.1/abcd

The following command displays the file named zone_server_log.889 in the log directory.

switch# sh pro log pid 1473

======================================================

Service: ips

Description: IPS Manager

Started at Tue Jan 8 17:07:42 1980 (757583 us)

Stopped at Thu Jan 10 06:16:45 1980 (83451 us)

Uptime: 1 days 13 hours 9 minutes 9 seconds

Start type: SRV_OPTION_RESTART_STATELESS (23)

Death reason: SYSMGR_DEATH_REASON_FAILURE_SIGNAL (2)

Exit code: signal 6 (core dumped)

CWD: /var/sysmgr/work

Virtual Memory:

CODE 08048000 - 080FB060

DATA 080FC060 - 080FCBA8

BRK 081795C0 - 081EC000

STACK 7FFFFCF0

TOTAL 20952 KB

Register Set:

EBX 000005C1 ECX 00000006 EDX 2AD721E0

ESI 2AD701A8 EDI 08109308 EBP 7FFFF2EC

EAX 00000000 XDS 0000002B XES 0000002B

EAX 00000025 (orig) EIP 2AC8CC71 XCS 00000023

EFL 00000207 ESP 7FFFF2C0 XSS 0000002B

Stack: 2608 bytes. ESP 7FFFF2C0, TOP 7FFFFCF0

0x7FFFF2C0: 2AC8C944 000005C1 00000006 2AC735E2 D..*.........5.*

0x7FFFF2D0: 2AC8C92C 2AD721E0 2AAB76F0 00000000 ,..*.!.*.v.*....

0x7FFFF2E0: 7FFFF320 2AC8C920 2AC513F8 7FFFF42C ... ..*...*,...

0x7FFFF2F0: 2AC8E0BB 00000006 7FFFF320 00000000 ...*.... .......

0x7FFFF300: 2AC8DFF8 2AD721E0 08109308 2AC65AFC ...*.!.*.....Z.*

0x7FFFF310: 00000393 2AC6A49C 2AC621CC 2AC513F8 .......*.!.*...*

0x7FFFF320: 00000020 00000000 00000000 00000000 ...............

0x7FFFF330: 00000000 00000000 00000000 00000000 ................

0x7FFFF340: 00000000 00000000 00000000 00000000 ................

0x7FFFF350: 00000000 00000000 00000000 00000000 ................

0x7FFFF360: 00000000 00000000 00000000 00000000 ................

0x7FFFF370: 00000000 00000000 00000000 00000000 ................

0x7FFFF380: 00000000 00000000 00000000 00000000 ................

0x7FFFF390: 00000000 00000000 00000000 00000000 ................

0x7FFFF3A0: 00000002 7FFFF3F4 2AAB752D 2AC5154C .

... output abbreviated ...

Stack: 128 bytes. ESP 7FFFF830, TOP 7FFFFCD0

Step 7 Enter the following command configure the switch to use TFTP to send the core dump to a TFTP server.

switch(config)# sys cores tftp:[//servername][/path]

This command causes the switch to enable the automatic copy of core files to a TFTP server. For example, the following command sends the core files to the TFTP server with the IP address 10.1.1.1.

switch(config)# system cores tftp://10.1.1.1/cores

The following conditions apply:

•The core files are copied every 4 minutes. This time is not configurable.

•The copy of a specific core file can be manually triggered, using the command copy core//module#/pid# tftp//tftp_ip_address/file_name

•The maximum number of times a process can be restarted is part of the HA policy for any process (this parameter is not configurable). If the process restarts more than the maximum number of times, the older core files are overwritten.

•The maximum number of core files that can be saved for any process is part of the HA policy for any process (this parameter is not configurable, and it is set to 3).

Step 8 To determine the cause and resolution for the restart condition, call Cisco TAC and ask them to review your core dump.

Working with Unrecoverable System Restarts

An unrecoverable system restart may occur in the following cases:

•A critical process fails and is not restartable

•A process restarts more times than is allowed by the system configuration

•A process restarts more frequently than is allowed by the system configuration

The effect of a process restart is determined by the policy configured for each process. Unrecoverable restarts may cause loss of functionality, restart of the active supervisor, a supervisor switchover, or restart of the switch.