Weird Seagate Drive Failure

While migrating data from my ReiserFS-formatted disks over to ext4 volumes, I ran into a weird issue with a Seagate drive. It’s a Barracuda 7200.14, model ST3000DM001, with the latest firmware. It’s been running fine, and I just copied all of its data off with no problems. Copying new data onto it, though, a short bit into the transfer it slows way down, to below 1MB/s, and eventually drops off of the SATA link entirely. Upon a reboot, it’s all back, and SMART diagnostics show no errors ever detected by the drive. Doing a diagnostic test on the drive shows nothing wrong. Reading the data works fine. I’ve tried the drive in 3 different drive controllers so far, disabled Native Command Queuing (NCQ), replaced cables, no difference. At this point I can just power up the system (which contains multiple drives of the same model that don’t exhibit this problem), and start writing information to that drive without ever reading it, and it starts to slow down within 30 seconds. It drops offline a few minutes later. When I turned off NCQ, it didn’t drop offline during the time I tested it, but it did slow way down, then speed back up, then slow way down again, repeatedly.

It’s not just that this is not how drives are supposed to behave. This isn’t how drives are supposed to fail, either. If there’s a defect on the media, it’s detected when the drive tries to read that section, then reported as a failure and put on a list of sectors pending relocation to a spare area on the disk. The relocation doesn’t happen until that section is overwritten, because the drive then knows that it’s safe to give up on ever reading the old data. None of this explains the behavior of reading being fine, and writing hosing everything without logging a problem on the drive.

I’ve seen 2 or 3 posts online from people clearly describing the exact same problem with this model of drive, but never with a solution; the thread either never went anywhere, or the poster RMA’d the drive. Mine isn’t under warranty according to Seagate’s web page.

At this point, the easy options seem to be exhausted. The next things I can think of to try are:

Downgrade the firmware to an older version, if it will let me.

Connect a TTL RS232 adapter to the diagnostic port on the drive’s board and see what it says during powerup, and during failure. I haven’t delved into Seagate’s diagnostic commands before, so maybe there’s something there to help.

Pull out my new hot air rework station, swap the drive’s BIOS chip with a spare board from a head-crashed drive, and see if that’s any better.