8.1 Replacing a SCSI Controller Module

SCSI controller modules are hot-swappable. In the event that it is impossible or impractical to halt I/O from hosts to the array, a controller can be replaced while the surviving controller is active and servicing I/O.

However, if your configuration enables you to halt I/O during the controller replacement procedure without disruption, it is a good idea to do so. With
write-back cache enabled under heavy I/O, cache synchronization can take three hours or more to complete. During this time, your controller will display "preparing failback" status and the LED remains amber.

Note - Under firmware version 4.1x, a failed controller forces write-through cache mode by default and is only subject to the 10 minute cross-load time. However, if the default event trigger for a failed controller has been set to DISABLED, cache synchronization could take over three hours. For details regarding write policy and event triggers, refer to the Sun StorEdge 3000 Family RAID Firmware User's Guide.

It is also possible that hardware errors or configuration errors may be present but undetected, resulting in an unserviceable condition that will only be revealed when the hot-swap is underway. For instance, it is possible that a hardware module might be improperly seated.

It is good practice before replacing a controller FRU to try to verify your system's health. Do not replace a working controller on a channel that is offline.

When a controller fails in a dual-controller configuration, the remaining controller automatically becomes the primary controller, if it is not already. If you replace a controller that has not failed, force-fail the controller to be replaced via one of the following methods.

Note - If you are running version 2.0 or greater of the Sun StorEdge CLI, before issuing the fail command, perform a show redundancy command to check the status of the redundant pair. This command will also display the position of the primary controller. Also perform a show events command for each controller to view any error messages. The dual-controller array is healthy if the Redundancy mode is Active-Active and the status is Enabled. The single-controller array is healthy if the Redundancy mode is Active-Active and the status is Scanning.

For details on the fail, show redundancy, and show events commands, refer to the Sun StorEdge 3000 Family CLI User's Guide.

To provide maximum usability, the Sun StorEdge 3310 SCSI array controller module has controller firmware version 3.66 which is only available in SCSI FRUs and which provides the following functions:

In a dual-controller chassis, the firmware in the active controller will automatically be loaded to this FRU. The 3.66 firmware provides this cross-load functionality.

When a controller is replaced in a dual-controller configuration, the controller firmware of the remaining functional controller automatically overwrites the firmware of the new replacement controller to maintain compatibility. This is referred to as cross-loading. Cross-loading uses the NVRAM configuration settings to synchronize the firmware version of the newly installed controller to match the firmware version of the running controller.

In a single-controller chassis, the 3.66 firmware cannot be used by itself and must have a released version of 3.25 or 4.1x firmware downloaded to the controller FRU. The SCSI firmware patches available on SunSolve for the single-controller FRU configurations will be:

SCSI patch ID 113722-09 for 3.25W firmware

SCSI patch ID 113722-10 for 4.13B or later firmware

To upgrade the 3.66 firmware to 4.1x firmware for a single-controller configuration, refer to the patch README instructions.

It is a recommended practice to always check the SunSolve Download Center located on http://sunsolve.sun.com for the latest available firmware upgrades.

Note - Be sure to check and update SAF-TE firmware in all configurations to provide the best performance. The latest SAF-TE firmware is included in each controller firmware patch.

Note - When a controller is installed and initialized or when configuration settings are changed, you are strongly advised to make a record of the new configuration settings and firmware version. This is particularly important in a single controller configuration for re-establishing your configuration settings when a controller is replaced. You can record this information in the "Record of Settings" appendix in the Sun StorEdge 3000 Family RAID Firmware User's Guide.

Note - The batteries in controller FRUs experience discharge during shipment and might require an extended charging cycle upon initial power-up. Nominal battery operation is achieved when the battery status LED changes from amber to flashing green within 25 minutes after the initial power cycle. If the battery status LED remains amber for more than 25 minutes after the initial power-up, then the unit must be power cycled to initiate the extended charging cycle. If the battery status LED remains amber for more than 30 minutes after initiating the extended charging cycle, contact Sun service personnel for additional instructions.

8.2 Saving the NVRAM Configuration Settings

Before replacing a controller module, save the NVRAM configuration settings to disk. The saved NVRAM file can be used to restore configuration settings only if the replacement controller has the same firmware version as the defective controller.

Caution - Cross-loading does not work when you replace a 4.1x SCSI controller with a 3.25 SCSI controller in a dual-controller chassis. If you replace a 4.1x controller with a 3.25 controller in this configuration without upgrading the 3.25 controller, the replacement controller will not be recognized.

For all other dual-controller configurations, perform the following procedures.

8.3.1 Removing a SCSI Controller Module

1. Keep the array powered on.

2. Remove all cables from the controller module.

3. Turn the thumbscrews on the left and right sides of the controller module counterclockwise until the thumbscrews are disengaged from the chassis.

4. Hold the thumbscrews, and pull out the controller module.

8.3.2 Installing a SCSI Controller Module

1. Keep the array powered on.

Caution - DO NOT POWER OFF the chassis when you replace a controller module. Multiple problems can occur. If you power off the array and replace a controller module in a dual-controller configuration, the replacement controller could become the primary controller and overwrite any configuration settings previously set. Additionally, if the array is powered off incorrectly, data that is written to cache and that has not been completely written to the disks will be lost. If you powered off the array during replacement, see Section 8.3.4, Restoring the Configuration Settings of a Powered-Off Array.

a. Insert the controller FRU into the slot, and push forward until you begin to feel resistance as the connecting pins engage.

b. Push the controller FRU until the connecting pins are fully seated and the RAID controller fits flush against the back plate of the RAID array.

Failure to insert the controller carefully as described above can cause one of the following problems:

The surviving controller might reset, causing both controllers to go offline, or the replacement controller might become primary and the surviving controller might become secondary which can cause the controllers to go offline. Recovery: Wait until both RAID controllers initialize and come up in redundant mode with no intervention required.

If the controller status LED is blinking green on both controllers, then both controllers are primary controllers. Recovery: Take out the replacement controller and reinsert it, carefully following the instructions above. If this does not remedy the problem, power-cycle the array.

3. Turn the thumbscrews on the left and right sides of the controller module clockwise until they are finger-tight, to secure the module and to make the module's front panel flush with the chassis.

To ensure that a thumbscrew is finger-tight, tighten it with a screwdriver and then loosen the thumbscrew counterclockwise a quarter turn.

The new controller automatically becomes the secondary controller.

In a redundant controller configuration where a new controller FRU is installed, the controller status LED will remain amber until the controllers complete the redundant controller process, which can take more than 10 minutes. Identical firmware versions on both controllers is required for proper redundant controller operation.

The redundant controller process automatically cross-loads the firmware version of the newly installed controller FRU to match the firmware version of the other running controller. For example, if the running controller has firmware 4.15C and the new controller has 4.15E, the new controller will be cross-loaded with the 4.15C firmware of the running controller.

Caution - Wait a minimum of 10 minutes for the firmware cross-load to be completed. If the newly installed controller is removed for any reason during the period when the status LED is amber (for 10 minutes or more), the controller can be rendered inoperable and must be returned for repair.

4. If you want the most current version of firmware on your controllers, download the latest firmware patch as described in the release notes for your array.

Caution - Follow the upgrade instructions in the patch README file or the downgrade instructions in this document with great care. If the wrong firmware is installed, or the firmware is installed on the wrong device, your controller might be rendered inoperable.

Note - If you are using 3.25W or earlier controller firmware and do not want to upgrade to version 4.1x, you can download the most recent 3.25W firmware patch 113722-09 for the Sun StorEdge 3310 SCSI array from sunsolve.sun.com.

To monitor the status of the automatic firmware update, use the Sun StorEdge CLI show redundancy command. The Sun StorEdge CLI will display the progression of "Failed," "Scanning," "Detected" and "Enabled" states.

Note - If you have not installed the Sun StorEdge CLI software, you must install it from the product CD for your array, or from the Sun Download Center website. For details, see the release notes for your array.

Initial Failed Status Response: This is the response to the command upon a controller failure and is shown for completeness.

Scanning Status: Install Controller FRU. The installed controller is performing self-test and scanning disk channels. This is also the state where the controller will update the firmware on the newly installed controller if not identical to the running firmware revision. The controllers can remain in this state for up to 10 minutes depending upon system activity.

...

Redundancy status: Scanning

Secondary controller serial number: 0

Detected Status: Redundant Controller Process Starts. The installed controller has completed the scanning of the disk channels, updated installed controller firmware as required, and communicated to the primary controller. This status is transitional and normally cannot be detected unless repetitive operations are executed.

8.3.4 Restoring the Configuration Settings of a Powered-Off Array

If the array was inadvertently powered off during the controller replacement in a dual-controller configuration, you must perform the following steps to ensure successful operation of your array.

1. In a dual-controller configuration where both controllers have the same version number (such as 4.15C and 4.15E), power on the array and wait a minimum of 10 minutes for firmware cross-load to occur.

2. In a dual-controller configuration where the controller modules have different firmware versions (such as 3.25 and 3.66), perform the following steps:

f. Wait a minimum of 10 minutes for the firmware cross-load to be completed.

Caution - If the newly installed controller is removed for any reason during the period when the status LED is amber (for 10 minutes or more), the controller can be rendered inoperable and must be returned for repair.

3. Confirm that the secondary controller is active on the array by entering one of the following commands.

Using the serial port connection to the firmware application, from the RAID firmware Main Menu, choose "view and edit Peripheral devices View Peripheral Device Status."

8.4 Converting a Dual-Controller Array to a Single-Controller Array

If one controller fails in a dual-controller configuration, you might want to run a single controller for an extended period of time so that the array does not display as degraded.

For instructions on converting a dual-controller configuration to a single-controller configuration, refer to the section titled "To Convert a Dual Controller Array to a Single Controller Array" in the Sun StorEdge 3000 Family Configuration Service User's Guide.

8.5 Converting a Single-Controller Array to a Dual-Controller Array

Note - SCSI single-controller arrays ship with a blanking panel covering the secondary controller slot. This must be removed in order to install a secondary controller.

1. Keep the array powered on and make sure that the connected hosts are inactive.

2. Turn the thumbscrews on the left and right sides of the blanking panel and remove it.

3. Gently slide the new controller module into the array.

Caution - Be sure that the module is properly inserted in the guide rails of the array and that you keep the power on.

a. Insert the controller FRU into the slot, and push forward until you begin to feel resistance as the connecting pins engage.

b. Push the controller FRU until the connecting pins are fully seated and the RAID controller fits flush against the back plate of the RAID array.

Failure to insert the controller carefully as described above can cause one of the following problems:

The original controller might reset, causing both controllers to go offline, or the new controller might become primary and the original controller might become secondary which can cause the controllers to go offline. Recovery: Wait until both RAID controllers initialize and come up in redundant mode with no intervention required.

If the controller status LED is blinking green on both controllers, then both controllers are primary controllers. Recovery: Take out the new controller and reinsert it, carefully following the instructions above. If this does not remedy the problem, power-cycle the array.

4. Turn the thumbscrews on the left and right sides of the controller module clockwise until they are finger-tight to secure the module and to make the module's front panel flush with the chassis.

To ensure that a thumbscrew is finger-tight, tighten it with a screwdriver and then loosen the thumbscrew counterclockwise a quarter turn.

The new controller automatically becomes the secondary controller.

When the new controller FRU is installed, the controller status LED will remain amber until the controllers complete the redundant controller process, which can take more than 10 minutes. The same firmware versions must be installed on both controllers for proper redundant-controller operation.

The redundant-controller process automatically cross-loads the firmware version of the newly installed controller FRU to match the firmware version of the other running controller. For example, if the running controller has firmware 4.12B and the new controller has 4.15, the new controller will be cross-loaded with the 4.12B firmware of the running controller. To monitor this process, see Section 8.3.3, Monitoring the Automatic Firmware Update for a Recently Installed Controller FRU.

Caution - Wait a minimum of 10 minutes for the firmware cross-load to be completed. If the newly installed controller is removed for any reason during the period when the status LED is amber (for 10 minutes or more), the controller can be rendered inoperable and must be returned for repair.

5. If you want the most current version of firmware on your controllers, download the latest firmware patch as described in the release notes for your array.

6. Set up the host channels for the new controller module.

For host channel set up information, refer to the "Connecting Your Array" chapter in the Sun StorEdge 3000 Family Installation, Operation, and Service Manual for your array.

Caution - You must set the hosts to the correct host channels on the controller module or your configuration will not work correctly.

8.6 SCSI Controller Replacement for a Single-Controller Array

To replace a SCSI controller module in a single-controller configuration, perform the following steps.

1. If possible, make a record of the firmware version and configuration settings before replacing the controller.

a. Use the show configuration CLI command to output the configuration settings to a file. Refer to the Sun StorEdge 3000 Family CLI User's Guide for more information.

b. Save NVRAM configuration settings to disk.

From the RAID firmware Main Menu choose "system Functions Controller maintenance Save nvram to disks," and choose Yes to save the contents of NVRAM to disk.

c. Record the Controller Unique Identifier which combines the serial number and MAC address for each chassis and is used for network connections.

a. Keep the array powered on and make sure that the connected hosts are inactive.

b. Remove all cables from the controller module.

c. Turn the thumbscrews on the left and right sides of the controller module counterclockwise until the thumbscrews are disengaged from the chassis.

d. Hold the thumbscrews, and carefully pull out the controller module.

3. Insert the replacement controller.

a. Keep the array powered on.

b. Insert the controller FRU into the slot, and push forward until you begin to feel resistance as the connecting pins engage.

c. Push the controller FRU until the connecting pins are fully seated and the RAID controller fits flush against the back plate of the RAID array.

Caution - Be sure that the module is properly inserted into the guide rails of the array.

d. Turn the thumbscrews on the left and right sides of the controller module clockwise until they are finger-tight, to secure the module and to make the module's front panel flush with the chassis.

To ensure that a thumbscrew is finger-tight, tighten it with a screwdriver and then loosen the thumbscrew counterclockwise a quarter turn.

4. Reconnect the original cables to the new controller module.

5. Download the desired firmware version from SunSolve.

Note - Firmware version 3.66 is a special bridge firmware that allows cross-loading from 3.25 or 4.1x firmware modules in a dual-controller configuration. In a single-controller configuration, you must download the latest released firmware version 3.25 or 4.1x into the new controller after installing it.

b. If the "Controller Unique Identifier" is not set to the value recorded in step 1c, type the value 0 (to automatically read the chassis serial number from the midplane) or type the hex value for the original serial number of the chassis (used when the midplane has been replaced).

The Controller Unique Identifier is used to create Ethernet MAC addresses and worldwide names. The value 0 is immediately replaced with the hex value of the chassis serial number. A nonzero value should be specified only if the chassis has been replaced, but the original chassis serial number must be retained; this feature is especially important in a Sun Cluster environment, to maintain the same disk device names in a cluster.

c. To implement the revised configuration settings, choose "system Functions Reset controller" from the RAID firmware Main Menu, then choose Yes to confirm.

7. If the Sun StorEdge Configuration Service agent was stopped, restart it.

On Solaris and other UNIX systems, use the following command:

# /etc/init.d/ssagent start

On Microsoft Windows systems, use the "Services" utility to start the agent.

If other software such as StorADE was stopped, restart it following the procedures in the documentation for that software.

8.6.1 Downgrading From Controller Firmware Version 3.66 to 3.25

If you have a 3.66 SCSI replacement controller module for a single-controller configuration (see TABLE 8-1 for controller part numbers), you must downgrade the controller firmware to the released 3.25 firmware version or upgrade to a released 4.1x firmware version.

The 3.25 firmware works with the latest SAF-TE, PLD, and software versions. There is no need to downgrade any of these components if you downgrade the controller version. For instance, CLI version 2.0 is compatible with controller version 3.27. CLI 1.6.2 can be used to downgrade to 3.25 controller firmware but cannot be used to upgrade to 4.1x controller firmware.

The CLI download controller-firmware command restores factory defaults with downgrades and does not restore:

Controller IP address - You must have a serial connection to restore the IP address, and the serial connection must be set to 38400.

Net mask, gateway settings and baud rate for the serial port

Customized parameter settings - Record all custom settings prior to the downgrade. The CLI show configuration command does not include all firmware parameters. Be sure to record settings that are in the firmware only, namely sector/head/cylinder parameters and host LUN filter parameters.

1. Change your working directory to the directory in which the patch was unpacked and confirm that the SUN325W-3310.bin file is present using the ls command.

2. Invoke the Sun StorEdge CLI.

3. Verify that you are running CLI version 1.6.2 or 2.x with the version command.

4. Verify the product and revision of the array by typing the following command:

sccli> show inquiry

Confirm that the correct product name is displayed. Otherwise, this patch does not apply. Select a different device or discontinue installation of this patch. If the firmware version reported is 3.66, continue with Step 5.

5. If feasible, save the configuration in a separate location.

If you are unable to restore the 3.25 configuration, you can reference this file.

At the sccli> command prompt, type the following command:

sccli> show configuration filename.txt

where filename.txt is a text file, or

sccli> show configuration --xml filename.xml

where filename.xml is an xml file.

Note - These commands may take several minutes to complete.

6. Stop all I/O to the array before beginning the controller firmware downgrade and unmount any filesystems or volumes mounted from the array.

7. At the sccli> prompt, type:

sccli> download controller-firmware -r filename

where filename is SUN325W-3310.bin for the Sun StorEdge 3310 SCSI array.

Note - Disregard the CLI message that a script is available to automate the download; the script is only for upgrades.

The download controller-firmware command will display messages indicating that it is downloading the firmware, programming the controller's flash memory, and "engaging" the new firmware. Wait until the sccli> prompt appears again before proceeding. This might take 10 minutes or more.

Caution - DO NOT POWER OFF the array or remove a controller FRU within 10 minutes of performing a controller firmware upgrade, or the controller may be rendered inoperable.

8. To re-establish communication with the array, use the serial connection to restore the IP address, netmask, and gateway.

9. To complete the downgrade, access the CLI and enter the following commands:

sccli> reset nvram

sccli> reset controller

10. To re-establish communication with the array, use the serial connection to restore the IP address, netmask, and gateway.

11. Verify the firmware revision of the array by typing the following command:

sccli> show inquiry

Confirm that the firmware revision is now reported as 325W for the SCSI array.

12. Reconfigure your array to the desired configuration.

a. If you saved a 3.25 configuration file to restore, type:

sccli> download nvram filename

where filename is the name of the file that contains the configuration.

b. Otherwise, configure the array using the CLI or the firmware application.

Caution - Do not restore a 4.1x configuration on a 3.25 controller. This may cause data loss.

14. If the "Controller Unique Identifier" is not set correctly, perform the following steps.

a. Type the value 0 (to automatically read the chassis serial number from the midplane) or type the hex value for the original serial number of the chassis (used when the midplane has been replaced).

The Controller Unique Identifier is used to create Ethernet MAC addresses and worldwide names. The value 0 is immediately replaced with the hex value of the chassis serial number. A nonzero value should be specified only if the chassis has been replaced, but the original chassis serial number must be retained; this feature is especially important in a Sun Cluster environment to maintain the same disk device names in a cluster.

Caution - If the "Controller Unique Identifier" parameter has the wrong value, network connections will not work correctly and the worldwide name will be incorrect which will cause problems accessing the array.

b. To implement the revised configuration settings, choose "system Functions Reset controller" from the RAID firmware Main Menu, then choose Yes to confirm.

15. If the Sun StorEdge Configuration Service agent was stopped, restart it.

On Solaris and other UNIX systems, use the following command:

# /etc/init.d/ssagent start

On Microsoft Windows systems, use the "Services" utility to stop the agent.

If other software such as StorADE was stopped, restart it following the procedures in the documentation for that software.