The sd? drives are actually ide drives on a 3ware escalade controller.I have reason to believe the drives are good, before I installed themI scrubbed them with varying data patterns and verified that that I gotback what I put there. All tested cleanly overnight.

I recently added an integrity check to our backups - the integrity checkerwrites out the path, the gzip adler32 checksum, the size, and the mtime ofeach file. Each time I do a backup, the backup scripts look for the integrity listing in the other partitions and compares all files with thesame path, size, and modtime.

This morning I had a pile of errors after things having gone smoothly forthe last few weeks. I suspected that I had screwed something up, lookedover the backup scripts, simplified them down to a simple cpio, and triedagain. Another pile of errors, different set of files.

In both cases, the newly created files were corrupted, the ones on the live /home partition as well as the /weekly & /monthly partitions all compared cleanly.

I rebooted into 2.2.19, tried again, no errors. I was running 2.4.5,no patches. I power cycled the machine between each reboot, went throughthe bios memory check, and also went through my own memory check; memory does not seem to be an issue.

I think I can reproduce this, it takes a reboot and about 2 hours. I madeit happen twice with 2.4.5, the first try on 2.2.19 did not work.

The data corruption looks like *extra* bytes added at the beginning offiles. I only looked at a few, if we go down the path of debugging thisI'll save them all next time. The extra byte counts were small, in onecase there was the letter "1" added to the start of the file, other thanthat it was identical. That's really weird, as a file system guy, I'dexpect to see blocks of data not small chunks of data. Very strange.

One thing I haven't done is to rule out the 3ware controller. I tend todoubt it is the problem but who knows.

There were no kernel messages complaining about anything during the backup, so the kernel doesn't seem to know there is a problem.