I set up an xfs volume a couple years back. After copying a few files over nfs, it became corrupted. the xfs fsck did something -- it told me that it was so corrupted, it couldn't be fixed.

I think you mean xfs_repair. On XFS, fsck is a no-op.

I've never yet seen xfs_repair tell me there was an issue it couldn't fix -- that sounds unusual. However there have been lots of changes to XFS in the Linux kernel in recent years, and occasionally there has been a few nasty bugs, some of which I ran into. Linux-2.6.19 in particular had some nasty XFS filesystem corruption bugs.

1) weird problem occurs after install. You report problem to IBM.2) IBM asks for your software version, see they are the newest ones available, and say they look into it.3) You ask several month later if they did find anything. They ask for your software version, they ask you to upgrade and see if the problem goes away.4) You upgrade to newest version.5) go to 2)

*There are of course non-weird problems where you get the answer from IBM support in 2-3 days, and from Linux forums in 2-3 minutes.

Backups simply are not really an option past 20+ terabytes of storage, and simply not feasible if the storage is volatile in nature. AFAIK everyone has gone to redundancy over backups at scale.

200TB/130TB usable clustered/distributed system with 4x LTO5 drives and we do a full snapshot to tape every week. With data that size you either pay up-front for proper engineering or you pay for the life of the system for poor performance and eventual cleanup of the mess.