Seagate Barracuda HHD Replacement

Version 1.0

Revised: April 2012 OL-27061-01

Purpose

This document addresses customer's concerns related to a recent increase in SATA disk failure frequency observed in the live production environment. It provides customers with a Method Of Procedure (MOP) to determine the process for replacement of these drives, and order of execution to follow when replacing the Seagate Barracuda HDD models with Seagate Constellation or Western Digital RE4 disks. It normally takes approximately 45-60 minutes per device to complete this operation.

Note Please read the whole document before attempting this operation.

Before going onsite, gather the following items:

1. USB DVD drive

2. Keyboard

3. Mouse

4. Monitor with cable

5. CDS-IS 2.5.11.11 Rescue CD

6. Software

7. Seagate ST500NM0011 or Western Digital WD5003ABYX disk, 12 per device (some SR and CDSM models only have three disks). This document assumes the disks are new formatted and blank. If you are pulling disks from some other source, you may need to erase old data or format drive.

Identifying a Drive

Customers have been experiencing a recent increase in SATA disk failure frequency observed in their live Internet Streamer CDS production network (CDD). These disk failures are specifically related to the Barracuda HDD model type (Seagate Part Number ST3500320NS). Table 1 shows the different HDD model types.

Table 1 HDD Motel Types

Manufacture

Part Number

Product Name

Cisco Part Number

Firmware Version

Approximate Introduction

Status

Seagate

ST3500320NS

Barracuda

74-5720-01

SN06

2010

EOL

Seagate

ST3500514NS

Constellation

74-5720-02

SN11

2011

EOL

Seagate

ST500NM0011

Constellation

74-5720-03

SN33

2012

Available

Western Digital

WD5003ABYX

RE4

74-5720-03

1S02

2012

Available

The Root Cause Analysis Cisco has received from Seagate indicated degraded heads because of interactions with thermal asperities that manifest themselves as read errors. Read/write issues caused by head interactions with buried defects resulted in degraded heads.

ii. To prevent problems with memory fragmentation on the SRs, the offloading of streamers and acquirers should be performed by setting the memory threshold to one; either via the device CLI, or from the CDSM.

Note You cannot offload the CDSM itself, because CDSM does not active service request, it is a configuration and monitor tool.

Step 2 Confirm all active connections have completed, with the following command and examples:

a. Check Web Cache with the following command:

ServiceEngine# show stat web detail | i Active

Active HTTPSession : 0

...

b. Check WMT with the following command:

ServiceEngine# show stat wmt usage

Usage Summary

=============

Concurrent Unicast Client Sessions

----------------------------------

Current: 0

Step 3 Determine where the system disk located and on which disk does the "Key CDNFS data" exist.

a. Run the "globedrvreplmnt.sh.signed" script using the following command:

ServiceEngine# script execute globedrvreplmnt.sh.signed

=================================================================

Information

Disk Model and Firmware Version

=== disk00 ===

Device Model: ST3500514NS

Firmware Version: SN11

=== disk01 ===

Device Model: ST3500514NS

Firmware Version: SN11

=== disk02 ===

Device Model: ST3500514NS

Firmware Version: SN11

...

=== disk11 ===

Device Model: ST3500514NS

Firmware Version: SN11

Key CDNFS data location: /disk00-06/uns-symlink-tree

Key CDNFS data location as dev: disk00 sda

System Drive - First: disk00 or sda

System Drive - Second: sdisk 01 or sdb

Device Mode: service-engine

RAID status:

SYSFS : RAID-1

Status: Normal

Partitions: disk00/05 disk01/05

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/01 disk01/01

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/02 disk01/02

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/04 disk01/04

=================================================================

=================================================================

*** Procedure 20 - The Key CDNFS data exist on a disk00 ***

*** Procedure 20 - The Key CDNFS data exist on a disk00 ***

*** Procedure 20 - The Key CDNFS data exist on a disk00 ***

=================================================================

=================================================================

Script globedrvreplmnt.sh.signed exited with return code 0

Note If you run the disk unuse disk {disk number 00-11] command and the command does give this message "disk [disk number 00-11} has key CDNFS data and cannot be unused!", then that disk has the "Key CDNFS data".

Step 4 The "globedrvreplmnt.sh.signed" script tells you which procedure to run to deal with system drive and "Key CDNFS data":

a. Procedure 10 - The "Key CDNFS data" exist on a non-system drive.

i. If disk00 is a Constellation disk, skip steps 'ii' to 'vii', and proceed to step 'viii'.

ii. If the drive is mounted, enter the disk unuse disk00 command to fully unuse the drive.

iii. Remove the disk 00 and insert the new disk.

iv. As a precaution, run the disk erase disk00 command to place the drive in the unformatted state.

v. Enter the disk policy apply command to format and mount the drive, examine all disks and RAID volumes, and make any necessary changes.

vi. Enter the show disk details command to see if the drive was added as a SYSTEM drive.

vii. If so, enter the show disk raid command to verify that the RAID volumes have been completely resynchronized. See the section "Checking RAID Synchronization".

a. Remove Disk 01 through Disk11 according to physical position and discard. So you are removing a total of 11 physical drives. If any of these physical disks are Constellation disks, they do not have to be replaced. Also please note, Service Router and CDSM may only have 3 drives.

Note Depending on the previous procedure used, disk01 may already be removed.

b. Insert 11 new replacement disks or less if Constellation disks were already present in the CDE-220 chassis.

c. Power up the SE.

d. Enter the disk policy apply command to format and mount the drive, examine all disks and RAID volumes, and make any necessary changes.

e. If the device is SE only, once system is up and running, run "recover-cdnfs utility" script to remove references to old data/content, using the following command: script execute recover-cdnfs utility.

Note This script does not need to be executed on SR or CDSM.

Step 6 Verify that the new disks have the correct size, etc with the show disks details command; see sample below:

List of all disk drives:

disk00: Normal (h02 c00 i00 l00 - mptsas) 476940MB(465.8GB)

disk00/01: SYSTEM 1019MB( 1.0GB) mounted internally

disk00/02: SYSTEM 509MB( 0.5GB) mounted internally

disk00/04: SYSTEM 8189MB( 8.0GB) mounted internally

disk00/05: SYSFS 32765MB( 32.0GB) mounted at /local1

disk00/06: CDNFS 434445MB(424.3GB) mounted internally

disk01: Normal (h02 c00 i01 l00 - mptsas) 476940MB(465.8GB)

disk01/01: SYSTEM 1019MB( 1.0GB) mounted internally

disk01/02: SYSTEM 509MB( 0.5GB) mounted internally

disk01/04: SYSTEM 8189MB( 8.0GB) mounted internally

disk01/05: SYSFS 32765MB( 32.0GB) mounted at /local1

disk01/06: CDNFS 434445MB(424.3GB) mounted internally

disk02: Normal (h02 c00 i02 l00 - mptsas) 476940MB(465.8GB)

disk02/01: CDNFS 476929MB(465.8GB) mounted internally

...

disk11: Normal (h02 c00 i11 l00 - mptsas) 476940MB(465.8GB)

disk11/01: CDNFS 476929MB(465.8GB) mounted internally

(*) Disk drive won't be used after reload.

Note Disk 00 and Disk 01 should now be "System Disks" in the RAID.

Step 7 Use the show alarm detail command to check for disk alarms, all disks should be alarm free.

Step 8 Verify that the disks have the correct device model and firmware version with show disks SMART-info command, example below:

Note After inserting the new Constellation replacement disks, the Activity LEDs will be OFF when the disks are idle. This behavior is different from the older Barracuda disks whose Activity LEDs were ON when the disks were idle. This behavior does not apply for the Western Digital drives.

Step 9 Re-run the "globedrvreplmnt.sh.signed" script using the following command to check status:

b. To prevent problems with memory fragmentation on the SRs the onloading of streamers and acquirers should be performed by restoring the memory threshold to 90; either via the device CLI, or from the CDSM.

Checking RAID Synchronization

Step 1 Check if Disk 00 and Disk 01 are fully synced. Check if the two SYSTEM drives (typically disk00 and disk01) are fully synced, see example:

ServiceEngine# show disks raid-state

SYSFS : RAID-1

Status: Normal

Partitions: disk00/05 disk01/05

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/01 disk01/01

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/02 disk01/02

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/04 disk01/04

Step 2 Use show alarm detail commandto ensure there are no outstanding alarms against either SYSTEM drive. The following alarm(s) may be displayed during the RAID resynchronization process:

ServiceEngine# show alarms

Critical Alarms:

----------------

None

Major Alarms:

-------------

None

Minor Alarms:

-------------

Alarm ID Module/Submodule Instance

-------------------- -------------------- -------------

3 SoftRAID_Event sysmon md02

4 SoftRAID_Event sysmon md03

5 SoftRAID_Event sysmon md04

6 SoftRAID_Event sysmon md05

Note If the show disks raid-state command does not eventually show that all SYSTEM drives are re-synchronized OR if any alarms (including the above SoftRAID* alarms) are still pending and do not clear, please do not continue with this procedure. Instead, proceed with the "Emergency Procedures using Rescue CD" section.

Emergency Procedures using Resident Rescue System Image

The SE, SR and CDSM have a resident rescue system image that is invoked should the image in flash memory be corrupted. A corrupted system image can result from a power failure that occurs while a system image is being written to flash memory. The rescue image can download a system image to the main memory of the device and write it to flash memory.

Note The .sysimg file is located under the images folder on the Recovery CD-ROM. If you have upgraded the CDS software, download the corresponding rescue CD iso image, copy to a CD and use the rescue iso image.

To install a new system image using the rescue image, do the following:

Step 1 Download the system image file (*.sysimg) to a host that is running an FTP server.

Step 2 Establish a console connection to the device and open a terminal session.

Step 3 Reboot the device by toggling the power switch.

The rescue image dialog appears. The following example demonstrates how to interact with the rescue dialog and use a port channel for the network connection (user input is denoted by entries in bold typeface). This example is for the CDE220-2G2, which has 10 gigabit Ethernet interfaces. The CDE110 and CDE205 have two gigabit Ethernet interfaces and the CDE220-2S3i has 14 gigabit Ethernet interfaces.

The system boots from the image on the CD. This requires a terminal server to be hooked up to the serial port of the SE205 or SE220. All communication is done through the serial port (see the "Before You Begin" section for terminal server settings).

Note Boot and installation output is directed to the terminal server console and cannot be viewed from a monitor.

Step 5 Power on the external USB DVD-ROM drive.

Step 6 Power on the SE.

Once the CD starts booting, it displays a spinning "|" symbol for approximately five minutes. Allow the booting to proceed and monitor the sequence from a remote terminal provided by the terminal server.

The Installer main menu is displayed at the conclusion of the boot sequence; you should see something similar to this, inputs or options to select has been highlighted in bold:

Step 14 After a few minutes, approximately two polling intervals, the device status shows online and all configurations (delivery service assignments, programs, and so on) are the same as those on the device that was replaced.

Step 15 Once the new device is up and running, as noted by the online status, the old device can be removed from the CDS network.

Appendix A: Sample Output after Replacing Disks

This section contains the sample output on an SE, SR and CDSM ater replacing a disk.

SE

ServiceEngine# script execute globedrvreplmnt.sh.signed

---------- Global drive replacement log ----------

Fri Mar 2 10:48:28 UTC 2012

====================================================================

Information

Current Alarms

Critical Alarms:

----------------

None

Major Alarms:

-------------

None

Minor Alarms:

-------------

None

System Initialization Finished.

Disk Model and Firmware Version

=== disk00 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk01 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk02 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk03 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk04 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk05 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk06 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk07 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk08 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk09 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk10 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk11 ===

Device Model: ST500NM0011

Firmware Version: SN33

System drive and Key CDNFS data location

Key CDNFS data location: /disk00-06/uns-symlink-tree

Key CDNFS data location as dev: disk00 or sda

System Drive - First: disk00 or sda

System Drive - Second: disk01 or sdb

Device Mode: service-engine

RAID status:

SYSFS : RAID-1

Status: Normal

Partitions: disk00/05 disk01/05

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/01 disk01/01

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/02 disk01/02

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/04 disk01/04

====================================================================

====================================================================

====================================================================

*** Use Procedure 20 - The Key CDNFS data exist on a disk00 ***

*** Use Procedure 20 - The Key CDNFS data exist on a disk00 ***

*** Use Procedure 20 - The Key CDNFS data exist on a disk00 ***

====================================================================

====================================================================

Done

The log file name is /local/local1/globedrvlog.cds-esc-is-g2L1-SE4.txt

Script globedrvreplmnt.sh.signed exited with return code 0

SR

ServiceRouter# script execute globedrvreplmnt.sh.signed

---------- Global drive replacement log ----------

Fri Mar 2 11:36:26 UTC 2012

====================================================================

Information

Current Alarms

Critical Alarms:

----------------

None

Major Alarms:

-------------

None

Minor Alarms:

-------------

None

System Initialization Finished.

Disk Model and Firmware Version

=== disk00 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk01 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk02 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk03 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk04 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk05 ===

Device Model: ST500NM0011

Firmware Version: SN33

System drive and Key CDNFS data location

Key CDNFS data location: No Key CDNFS data found

System Drive - First: disk00 or sda

System Drive - Second: disk01 or sdb

Device Mode: service-router

RAID status:

SYSFS : RAID-1

Status: Normal

Partitions: disk00/05 disk01/05

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/01 disk01/01

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/02 disk01/02

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/04 disk01/04

====================================================================

====================================================================

====================================================================

*** Service Router * Service Router * Service Router ***

*** This is a Service Router. It does not have a CDNFS ***

*** partition or Key CDNFS data or more than 3 drives. ***

*** Use Procedure 60 - Service Router ***

*** Use Procedure 60 - Service Router ***

*** Use Procedure 60 - Service Router ***

====================================================================

====================================================================

Done

The log file name is /local/local1/globedrvlog.cds-esc-is-g2SR1.txt

Script globedrvreplmnt.sh.signed exited with return code 0

CDSM

CDSM# script execute globedrvreplmnt.sh.signed

---------- Global drive replacement log ----------

Fri Mar 2 10:49:55 UTC 2012

====================================================================

Information

Critical Alarms:

----------------

None

Major Alarms:

-------------

None

Minor Alarms:

-------------

None

System Initialization Finished.

Disk Model and Firmware Version

=== disk00 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk01 ===

Device Model: ST500NM0011

Firmware Version: SN33

=== disk02 ===

Device Model: ST500NM0011

Firmware Version: SN33

System drive and Key CDNFS data location

Key CDNFS data location: No Key CDNFS data found

System Drive - First: disk00 or sda

System Drive - Second: disk01 or sdb

Device Mode: content-delivery-system-manager

RAID status:

SYSFS : RAID-1

Status: Normal

Partitions: disk00/05 disk01/05

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/01 disk01/01

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/02 disk01/02

SYSTEM: RAID-1

Status: Normal

Partitions: disk00/04 disk01/04

====================================================================

====================================================================

====================================================================

*** Content Delivery System Manager * CDSM * CDSM * CDSM ***

*** This is a CDSM. It does not have a CDNFS partition ***

*** or Key CDNFS data or more than 3 drives. ***

*** Use Procedure 60 - Content Delivery System Manager ***

*** Use Procedure 60 - Content Delivery System Manager ***

*** Use Procedure 60 - Content Delivery System Manager ***

====================================================================

====================================================================

Done

The log file name is /local/local1/globedrvlog.cds-esc-is-g2CDSM-Pr.txt

Obtaining Documentation and Submitting a Service Request

For information on obtaining documentation, submitting a service request, and gathering additional information, see the monthly What's New in Cisco Product Documentation, which also lists all new and revised Cisco technical documentation, at:

Subscribe to the What's New in Cisco Product Documentation as a Really Simple Syndication (RSS) feed and set content to be delivered directly to your desktop using a reader application. The RSS feeds are a free service and Cisco currently supports RSS version 2.0.

Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries. To view a list of Cisco trademarks, go to this URL: www.cisco.com/go/trademarks. Third-party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (1110R)

Any Internet Protocol (IP) addresses used in this document are not intended to be actual addresses. Any examples, command display output, and figures included in the document are shown for illustrative purposes only. Any use of actual IP addresses in illustrative content is unintentional and coincidental.