Apols up front if this is not the right area, I couldn't quite figure out what was apropriate.

I've got a file server which I use for all sorts of things. It's an ASUS eeebox running a hardened profile, with two 2TB USB disks in a raid-1 configuration. Yes, I was after low power and cheap, more than flat out performance.

I found the issue when using this server as a repo for ISO images when provisioning ovirt. I'd launch my VMs, they'd get partway through booting and installing from the ISO, and then croak with disk errors. Validating the checksums of the ISOs revealed corruption. Downloading fresh copies didn't help.

After some experiments, I determined that I couldn't successfully sha256sum these images either locally from the server, or from any machine on my NFS network. Better yet, on whatever machine I did the sha256sum, if I caused the file cache to turn over, I could sha256sum the same file again and get a different result. Including local on the server.

I've done tests on smaller files and had them work. I've done iozone tests both locally and over nfs, against this server, and those report no errors. I had a backup covering part of the contents of my homedir on this server, and all the files which were still valid checked out.

I'm a bit stumped. The symptom suggests that there's some kind of corruption happening in the read path from the raid device. But only on pretty big files.

I've looked at the md stats and stuff, nothing is reporting any errors that I can see.

What else should I try? Are there known issues with raid-1 across USB disks (I couldn't find any) ? Are there any known issues with hardened, ie should I switch back to a standard server profile?

One piece of advice, is if you're overclocking the system, be careful that it doesn't result in an overclocked USB interface as well - some motherboards use shared clocks for these.

Let's narrow down if our problem is before or after the "OR...." - try md5summing each drive individually twice and compare if you're getting the same results each of the two trials.

If the drives seem to be returning correct info that way, try upping the ante a bit and md5 both drives at the same time. This could bring out potential DMA sharing problems or similar. If it's still good, chances are your hardware is working properly.

Then, you would want to look at possible raid volume corruption. Without going into the details I can't remember, one thing to try is bringing up (in READ ONLY MODE) the raid using only one drive (hint: use keyword 'missing' to specify the nonexistant drive), and then bring up another raid (again, READ ONLY MODE) using the other drive. Now you have two raids running which should have the same data contents. md5 the two md devices and see if they're the same. If they're not the same, md5 them again to see if something weird is happening with the raid layer added in.

If the contents are the same, everything seems good thus far and it must be a filesystem related problem - use whatever error checking is available for that filesystem and look for data corruption related posts with it.

If your data access to the usb devices seems okay, but it's wonking out when raid is involved, try a kernel upgrade (and probably your filesystem is hosed and you'll have all kinds of problems... recommend recreating the whole raid in that case)