| Copy the startup-config file to a snapshot configuration in NVRAM. This step creates a backup copy of the startup-config file (see the Rollback chapter in the [http://www.cisco.com/en/US/docs/switches/datacenter/sw/4_0/nx-os/system_management/configuration/guide/sm_nx-os_config.html Cisco NX-OS System Management Configuration Guide, Release 4.0]).

+

| Copy the startup-config file to a snapshot configuration in NVRAM. This step creates a backup copy of the startup-config file (see the Rollback chapter in the [http://www.cisco.com/en/US/docs/switches/datacenter/sw/4_0/nx-os/system_management/configuration/guide/sm_nx-os_config.html Cisco NX-OS System Management Configuration Guide]).

| rowspan="1" colspan="1" |

| rowspan="1" colspan="1" |

Line 389:

Line 389:

[[Image:79950.jpg]]

[[Image:79950.jpg]]

-

===Recovery from the loader> Prompt on Supervisor Modules===

+

=== Recovery from the loader&gt; Prompt on Supervisor Modules ===

-

{{caution|This procedure uses the '''init system''' command, which reformats the file system of the device. Be sure that you have made a backup of the configuration files before you begin this procedure.}}

+

{{caution|This procedure uses the '''init system''' command, which reformats the file system of the device. Be sure that you have made a backup of the configuration files before you begin this procedure.}}

-

The loader> prompt is different from the regular switch# prompt. The CLI command completion feature does not work at the loader> prompt and may result in undesired errors. You must type the command exactly as you want the command to appear.

+

The loader&gt; prompt is different from the regular switch# prompt. The CLI command completion feature does not work at the loader&gt; prompt and may result in undesired errors. You must type the command exactly as you want the command to appear.

-

{{note|If you boot over TFTP from the loader> prompt, you must supply the full path to the image on the remote server.}}

+

{{note|If you boot over TFTP from the loader> prompt, you must supply the full path to the image on the remote server.}}

-

Use the '''help''' command at the loader> prompt to display a list of commands available at this prompt or to obtain more information about a specific command in that list.

+

{{note| The TFTP boot method is available only as a backup for diagnostics and for repairing bootflash corruption. The TFTP boot method is not intended to bring up the system to a fully operational state. Reloading the system is mandatory after all diagnostics and repairs have been completed.}}

-

To recover a corrupted kickstart image (system error state) for a system with a single supervisor module, follow these steps:

+

Use the '''help''' command at the loader&gt; prompt to display a list of commands available at this prompt or to obtain more information about a specific command in that list.

+

To recover a corrupted kickstart image (system error state) for a system with a single supervisor module, follow these steps:

-

1. Enter the local IP address and subnet mask for the system at the loader> prompt, and press '''Enter'''.

+

<br>1. Enter the local IP address and subnet mask for the system at the loader&gt; prompt, and press '''Enter'''.

{{caution|Be sure that you have made a backup of the configuration files before you enter this command.}}

+

{{caution|Be sure that you have made a backup of the configuration files before you enter this command.}}

-

5. Follow the procedure specified in the [[#Recovery from the switch(boot)# Prompt|Recovery from the switch(boot)# Prompt]] procedure.

+

5. Follow the procedure specified in the [[#Recovery_from_the_switch.28boot.29.23_Prompt|Recovery from the switch(boot)# Prompt]] procedure.

===Recovery from the loader> Prompt===

===Recovery from the loader> Prompt===

Line 940:

Line 945:

-

See the [http://www.cisco.com/en/US/docs/switches/datacenter/sw/4_0/nx-os/high_availability/configuration/guide/ha_nx-os_book.html Cisco NX-OS High Availability and Redundancy Guide, Release 4.0] for more information on high-availability policies.

+

See the [http://www.cisco.com/en/US/docs/switches/datacenter/sw/4_0/nx-os/high_availability/configuration/guide/ha_nx-os_book.html Cisco NX-OS High Availability and Redundancy Guide] for more information on high-availability policies.

===Unrecoverable System Restarts===

===Unrecoverable System Restarts===

Revision as of 20:55, 24 May 2012

This article describes how to identify and resolve problems that might occur when upgrading or restarting.

Information About Upgrades and Reboots

Cisco NX-OS consists of two images--the kickstart image and the system image. In order to bring up the system, both images should have the same image version.

Upgrades and reboots are ongoing network maintenance activities. You should try to minimize the risk of disrupting the network when performing these operations in production environments and to know how to recover quickly when something does go wrong.

Note:

This publication used the term upgrade to refer to both Cisco NX-OS upgrades and downgrades.

Upgrades and Reboot Checklist

Use the following checklist to prepare for an upgrade:

Checklist

Check off

Read the Release Notes for the release that you are upgrading or downgrading to.

Ensure that an FTP or TFTP server is available to download the software images.

Copy the new image onto your supervisor modules in bootflash: or slot0:.

Use the show install all impact command to verify that the new image is healthy and the impact that the new load will have on any hardware with regards to compatibility. Check for compatibility.

After you have completed the checklist, you are ready to upgrade the systems in your network.

Note:

It is normal for the active supervisor to become the standby supervisor during an upgrade.

Note:

Log messages are not saved across system reboots. However, a maximum of 100 log messages with a severity level of critical and below (levels 0, 1, and 2) are saved in NVRAM. You can view this log at any time by entering the show logging nvram command.

Verifying Software Upgrades

You can use the show install all status command to watch the progress of your software upgrade or to view the ongoing install all command or the log of the last installed install all command from a console, SSH, or Telnet session. This command shows the install all output on both the active and standby supervisor module even if you are not connected to the console terminal.

switch# show install all status
There is an on-going installation... <---------------------- in progress installation
Enter Ctrl-C to go back to the prompt.
Verifying image bootflash:/b-4.0.0.104
-- SUCCESS
Verifying image bootflash:/i-4.0.0.104
-- SUCCESS
Extracting system version from image bootflash:/i-4.0.0.104.
-- SUCCESS
Extracting kickstart version from image bootflash:/b-4.0.0.104.
-- SUCCESS
Extracting loaderâ version from image bootflash:/b-4.0.0.104.
-- SUCCESS

switch# show install all status
This is the log of last installation. <----------------- log of last install
Verifying image bootflash:/b-4.0.0.104
-- SUCCESS
Verifying image bootflash:/i-4.0.0.104
-- SUCCESS
Extracting system version from image bootflash:/i-4.0.0.104.
-- SUCCESS
Extracting kickstart version from image bootflash:/b-4.0.0.104.
-- SUCCESS
Extracting loader version from image bootflash:/b-4.0.0.104.
-- SUCCESS

Verifying a Nondisruptive Upgrade

When you initiate a nondisruptive upgrade, Cisco NX-OS notifies all services that an upgrade is about to start and finds out whether or not the upgrade can proceed. If a service cannot allow the upgrade to proceed at this time, then the service aborts the upgrade and you are prompted to enter the show install all failure-reason command to determine the reason why the upgrade cannot proceed.

switch# show install all failure-reason
Service: "cfs" failed to respond within the given time period.
switch#

If a failure occurs for whatever reason (such as a save runtime state failure or module upgrade failure) after the upgrade is in progress, then the device reboots disruptively because the changes cannot be rolled back. In such cases, the upgrade has failed.

If you need further assistance to determine why an upgrade is unsuccessful, you should collect the details from the show tech-support command output and the console output from the installation, if available, before you contact your technical support representative.

Using ROM Monitor Mode

If your device does not find a valid system image to load, the system will start in ROM monitor mode. ROM monitor mode can also be accessed by interrupting the boot sequence during startup. From ROM monitor mode, you can boot the device or perform diagnostic tests.

On most systems, you can enter ROM monitor mode by entering the reload EXEC command and then pressing the Break key on your keyboard or by using the Break key-combination (the default Break key combination is Ctrl-C) during the first 60 seconds of startup.

Troubleshooting Software Upgrades and Downgrades

This section describes how to troubleshoot a software installation upgrade or downgrade failure.

Software Upgrade Ends with Error

Problem

Possible Cause

Solution

The upgrade ends with an error.

The standby supervisor module bootflash: file system does not have sufficient space to accept the updated image.

Use the delete command to remove unnecessary files from the file system.

The specified system and kickstart images are not compatible.

Check the output of the installation process for details on the incompatibility. Possibly update the kickstart image before updating the system image.

Verify the state of the system at every stage and restart the upgrade after 10 seconds. If you restart the upgrade within 10 seconds, the command is rejected. An error message displays, indicating that an upgrade is currently in progress.

Upgrading Cisco NX-OS Software

To perform an automated software upgrade on any system from the CLI, follow these steps:

Log into the system through the console, Telnet, or SSH port of the active supervisor.

Create a backup of your existing configuration file, if required.

Perform the upgrade by entering the install all command.

Exit the system console and open a new terminal session to view the upgraded supervisor module by using the show module command.

Tip: Always carefully read the output of the install all compatibility check command. This compatibility check tells you exactly what needs to be upgraded (such as the BIOS, loader, or firmware) and what modules will experience a disruptive upgrade. If there are any questions or concerns about the results of the output, type n to stop the installation and contact the next level of support.

The following example shows an upgrade using the install all command with the source images located on an SCP server.

Power cycle the switch if required and enter CTRL-] when the switch says "Checking all filesystems....r. done." to interrupt the boot process at the switch#boot prompt. Use the Recovery from the switch(boot)# Prompt procedure to update the system image..

Corrupted Bootflash Recovery

All device configurations reside in the internal bootflash. If you have a corrupted internal bootflash, you could potentially lose your configuration. Be sure to save and back up your configuration files periodically. The regular system boot goes through the following sequence (see Figure 1):

The basic input/output system (BIOS) loads the loader.

The loader loads the kickstart image into RAM and starts the kickstart image.

The kickstart image loads and starts the system image.

The system image reads the startup-configuration file.

Figure 1 Regular Boot Sequence

If the images on your system are corrupted and you cannot proceed (error state), you can interrupt the system boot sequence and recover the image by entering the BIOS configuration utility described in the following section. Access this utility only when needed to recover a corrupted internal disk.

Caution:

The BIOS changes explained in this section are required only to recover a corrupted bootflash.

Recovery procedures require the regular sequence to be interrupted. The internal sequence goes through four phases between the time that you turn on the system and the time that the system prompt appears on your terminal--BIOS, boot loader, kickstart, and system.

Recovery Interruption

Phase

Normal Prompt--appears at the end of each phase.

Recovery Prompt--appears when the system cannot progress to the next phase.

Description

BIOS

loader>

No bootable device

The BIOS begins the power-on self test, memory test, and other operating system applications. While the test is in progress, press Ctrl-C to enter the BIOS configuration utility and use the netboot option.

Boot loader

Starting kickstart

loader>

The boot loader uncompresses the loaded software to boot an image using its filename as a reference. These images are made available through bootflash. When the memory test is over, press Esc to enter the boot loader prompt.

Kickstart

Uncompressing system

switch(boot)#

When the boot loader phase is over, press Ctrl-] (Control key plus right bracket key) to enter the switch(boot)# prompt. Depending on your Telnet client, these keys may be reserved, and you may need to remap the keystroke. See the documentation provided by your Telnet client. If the corruption causes the console to stop at this prompt, copy the system image and reboot the system.

System

Login:

--

The system image loads the configuration file of the last saved running configuration and returns a switch login prompt.

Figure 2 Regular and Recovery Sequence

Recovery from the loader> Prompt on Supervisor Modules

Caution:

This procedure uses the init system command, which reformats the file system of the device. Be sure that you have made a backup of the configuration files before you begin this procedure.

The loader> prompt is different from the regular switch# prompt. The CLI command completion feature does not work at the loader> prompt and may result in undesired errors. You must type the command exactly as you want the command to appear.

Note:

If you boot over TFTP from the loader> prompt, you must supply the full path to the image on the remote server.

Note:

The TFTP boot method is available only as a backup for diagnostics and for repairing bootflash corruption. The TFTP boot method is not intended to bring up the system to a fully operational state. Reloading the system is mandatory after all diagnostics and repairs have been completed.

Use the help command at the loader> prompt to display a list of commands available at this prompt or to obtain more information about a specific command in that list.

To recover a corrupted kickstart image (system error state) for a system with a single supervisor module, follow these steps:

1. Enter the local IP address and subnet mask for the system at the loader> prompt, and press Enter.

loader> set ip 172.16.1.2 255.255.255.0

2. Specify the IP address of the default gateway.

loader> set gw 172.16.1.1

3. Boot the kickstart image file from the required server.

loader> boot tftp://172.16.10.100/tftpboot/n7000-s1-kickstart-4.0.bin

In this example, 172.16.10.100 is the IP address of the TFTP server, and n7000-s1-kickstart-4.0.bin is the name of the kickstart image file that exists on that server.

The switch(boot)# prompt indicates that you have a usable Kickstart image.

4. Enter the init system command at theswitch(boot)# prompt.

switch(boot)# init system

Caution:

Be sure that you have made a backup of the configuration files before you enter this command.

Recovery from the loader> Prompt

Caution:

This procedure uses the init system command, which reformats the file system of the device. Be sure that you have made a backup of the configuration files before you begin this procedure.

Note:

The loader>prompt is different from the regular switch# or switch(boot)# prompt. The CLI command completion feature does not work at the loader> prompt and may result in undesired errors. You must type the command exactly as you want the command to appear.

Note:

If you boot over TFTP from the loader> prompt, you must supply the full path to the image on the remote server.

Tip: Use the help command at the loader> prompt to display a list of commands available at this prompt or to obtain more information about a specific command in that list.

To recover a corrupted kickstart image (system error state) for a system with a single supervisor module, follow these steps:

Recovery from the switch(boot)# Prompt

To recover a system image using the kickstart image for a system with a single supervisor module, follow these steps:

1. Change to configuration mode and configure the IP address of the mgmt0 interface.

switch(boot)# config t
switch(boot)(config)# interface mgmt0

2. Follow this step if you entered an init system command. Otherwise, skip to Step 3.

a. Enter the ip address command to configure the local IP address and the subnet mask for the system.

switch(boot)(config-mgmt0)# ip address 172.16.1.2 255.255.255.0

b. Enter the ip default-gateway command to configure the IP address of the default gateway.

switch(boot)(config-mgmt0)# ip default-gateway 172.16.1.1

3. Enter the no shutdown command to enable the mgmt0 interface on the system.

switch(boot)(config-mgmt0)# no shutdown

4. Enter end to exit to EXEC mode.

switch(boot)(config-mgmt0)# end

5. If you believe there are file system problems, enter the init system check-filesystem command. This command checks all internal file systems and fixes any errors that are encountered. This command takes a few minutes to complete.

switch(boot)# '''load bootflash:system-image1'''
Uncompressing system image: bootflash:/system-image1
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
Would you like to enter the initial configuration mode? (yes/no): yes

Note:

If you enter no, you will return to the switch# login prompt, and you must manually configure the system.

Recovery for Systems with Dual Supervisor Modules

This section describes how to recover when one or both supervisor modules in a dual supervisor system have corrupted bootflash.

Recovering One Supervisor Module With Corrupted Bootflash

If one supervisor module has a functioning bootflash and the other has a corrupted bootflash, follow these steps:

Boot the functioning supervisor module and log on to the system.

At the switch# prompt on the booted supervisor module, enter the reload moduleslot force-dnld command, where slot is the slot number of the supervisor module with the corrupted bootflash.

The supervisor module with the corrupted bootflash performs a netboot and checks the bootflash for corruption. When the bootup scripts discover that the bootflash is corrupted, it generates an init system command, which fixes the corrupt bootflash. The supervisor boots as the HA Standby.

Caution:

If your system has an active supervisor module currently running, you must enter the system standby manual-boot command in EXEC mode on the active supervisor module before entering the init system command on the standby supervisor module to avoid corrupting the internal bootflash:. After the init system command completes on the standby supervisor module, enter the system no standby manual-boot command in EXEC mode on the active supervisor module.

Recovering Both Supervisor Modules with Corrupted Bootflash

If both supervisor modules have corrupted bootflash, follow these steps:

1. Boot the system and press the Esc key after the BIOS memory test to interrupt the boot loader.

Note:

Press Esc immediately after you see the following message: 00000589K Low Memory Passed00000000K Ext Memory PassedHit ^C if you want to run SETUP....Wait.....If you wait too long, you will skip the boot loader phase and enter the kickstart phase.

You see the loader> prompt.

Caution:

The loader> prompt is different from the regular switch# or switch(boot)# prompt. The CLI command completion feature does not work at the loader> prompt and may result in undesired errors. You must type the command exactly as you want the command to appear.

Tip: Use the help command at the loader> prompt to display a list of commands available at this prompt or to obtain more information about a specific command in that list.

If you do not enter the reload module command when a boot failure has occurred, the active supervisor module automatically reloads the standby supervisor module within 3 to 6 minutes after the failure.

System or Process Resets

When a recoverable or nonrecoverable error occurs, the system or a process on the system may reset. See Table 2-4 for possible causes and solutions.

Problem

Possible Cause

Solution

The system or a process on the system resets.

A recoverable error occurred on the system or on a process in the system.

Verify that a clock module failed. Replace the failed clock module during the next maintenance window.

Recoverable System Restarts

Every process restart generates a syslog message and a Call Home event. Even if the event does not affect service, you should identify and resolve the condition immediately because future occurrences could cause a service interruption.

To respond to a recoverable system restart, follow these steps:

1. Check the syslog file to see which process restarted and why it restarted.

The output shows all cores that are presently available for upload from the active supervisor. The module-num column shows the slot number on which the core was generated. In the previous example, an FSPF core was generated on the active supervisor module in slot 5. An FCC core was generated on the standby supervisory module in slot 6. Core dumps generated on the module in slot 8 include ACLTCAM and FIB.

Copy the FSPF core dump to a TFTP server with the IP address 1.1.1.1, as follows:

switch# copy core://5/1524 tftp::/1.1.1.1/abcd

Display the file named zone_server_log.889 in the log directory as follows:

7. Enter the system cores tftp:[//servername][/path] command to configure the system to use TFTP to send the core dump to a TFTP server.

This command causes the system to enable the automatic copy of core files to a TFTP server. For example, the following command sends the core files to the TFTP server with the IP address 10.1.1.1:

switch(config)# system cores tftp://10.1.1.1/cores

The following conditions apply:

The core files are copied every 4 minutes. This time interval is not configurable.

The copy of a specific core file to a TFTP server can be manually triggered, by using the command copy core://module#/pid# tftp://tftp_ip_address/file_name.

The maximum number of times that a process can be restarted is part of the high-availability (HA) policy for any process. (This parameter is not configurable.) If the process restarts more than the maximum number of times, the older core files are overwritten.

The maximum number of core files that can be saved for any process is part of the HA policy for any process. (This parameter is not configurable, and it is set to three.)

8. Determine the cause and resolution for the restart condition by contacting your technical support representative and asking the representative to review your core dump.

Unrecoverable System Restarts

A process restarts more times than is allowed by the system configuration.

A process restarts more frequently than is allowed by the system configuration.

The effect of a process reset is determined by the policy configured for each process. An unrecoverable reset may cause functionality loss, the active supervisor to restart, a supervisor switchover, or the system to restart.

The show system reset-reason command displays the following information:

The last four reset-reason codes for the supervisor modules are displayed. If either supervisor module is absent, the reset-reason codes for that supervisor module are not displayed.

The show system reset-reason module number command displays the last four reset-reason codes for a specific module in a given slot. If a module is absent, then the reset-reason codes for that module are not displayed.

The overall history of when and why expected and unexpected reloads occur

The time stamp of when the reset or reload occurred

The reason for the reset or reload of a module

The service that caused the reset or reload (not always available)

The software version that was running at the time of the reset or reload

Standby Supervisor Fails to Boot

Explanation This message is printed if the standby supervisor doesn't complete its boot procedure (i.e. it doesn't reach the login prompt on the local console) 3 to 6 minutes after the loader has been loaded by the BIOS. This message is usually caused by boot variables not properly set for the standby supervisor. This message can also be caused by a user intentionally interrupting the boot procedure at the loader prompt (by means of pressing ESC).

Recommended Action Connect to the local console of the standby supervisor. If the supervisor is at the loader prompt, try to use the boot command to continue the boot procedure. Otherwise, issue a reload command for the standby supervisor from a vsh session on the active supervisor, specifying the force-dnld option. Once the standby is online, fix the problem by setting the boot variables appropriately.

Symptom

Possible Cause

Solution

Standby supervisor does not boot.

Active supervisor kickstart image booted from TFTP.

Reload the active supervisor from bootflash:.

Recovering the Administrator Password

You can access the system if you forget the administrator password.

Problem

Solution

You forgot the administrator password for accessing.

Use the Password Recovery procedure to recover the password using a local console connection.