Pinned topicIndblockBad appearing out of the blue

I'm just trying to understand what fsck does exactly with no luck. Questions go first, explanation follows.

1) What does IndblockBad translates to, i.e. what is "bad"? A block this inode resides in? Inode data? My metadata are replicated - which copy it would be then? List of blocks it points to? A checksum? On what? I can't figure out what triggers this message.
2) Some files are detected as corrupted by mmfsck, some are not and I'm trying to figure out why some files slip through. What's detection logic in here, i.e. what checks are done by mmfsck exactly?
3) How do I verify what gets corrupted - list of blocks inode points to or content of these blocks? Is there a way to do something like capturing full snapshot of inodes to do comparison when I discover corrupted file later?

Longer story - my scenario is that fsck reports issues like the one below:

It all begins when user reports garbled file (backup copy confirms that), mmfsck is run on mounted filesystem and no errors related to reported file are found - but others are, like the one above, and they do point to other corrupted files. So, I'm left with files which are scrambled and mmfsck not reporting them, files which are scrambled and gpfs somehow discovering them and files becoming corrupt at 1-2 a week rate.

In case that helps - my metadata are replicated (i.e. original and one replica), data are not. Corruption is block based - i.e. gpfs block is either fully fine or every single byte in one differs when compared to backup copy. Corrupted blocks are spread on tens of luns. Files can be read with no errors, but they do differ to backup copy. GPFS is 3.4.0-15, os Linux. Files which get corrupted are not recent ones, majority of them is >4 months old.

Any pointers are welcomed warmly. I'm happy to post any data requested.

Re: IndblockBad appearing out of the blue

That err means that an indirect block for inode 22226371 is corrupted. Online mmfsck cannot fix this problem.

Find out the filename for this inode by running

tsfindinode -i 22226371 $mountpoint

You can then delete and restore the file from backup.

However, if there is one file corrupted there may be other things gone wrong, so you should run offline mmfsck to assess the damage:

mmfsck $fsname -v -n > mmfsck.$fsname.out 2>&1

That err means that an indirect block for inode 22226371 is corrupted. Online mmfsck cannot fix this problem.

Find out the filename for this inode by running
<pre class="jive-pre">
tsfindinode -i 22226371 $mountpoint
</pre>

You can then delete and restore the file from backup.

However, if there is one file corrupted there may be other things gone wrong, so you should run offline mmfsck to assess the damage:
<pre class="jive-pre">
mmfsck $fsname -v -n > mmfsck.$fsname.out 2>&1
</pre>

Re: IndblockBad appearing out of the blue

That err means that an indirect block for inode 22226371 is corrupted. Online mmfsck cannot fix this problem.

Find out the filename for this inode by running
<pre class="jive-pre">
tsfindinode -i 22226371 $mountpoint
</pre>

You can then delete and restore the file from backup.

However, if there is one file corrupted there may be other things gone wrong, so you should run offline mmfsck to assess the damage:
<pre class="jive-pre">
mmfsck $fsname -v -n > mmfsck.$fsname.out 2>&1
</pre>

To clarify - indirect block is no different than on any other fs, right? I.e. it is a block attached to inode storing a list of all blocks used by this file? That would mean that data themselves are probably fine, but block locations get mixed up somehow? And how fsck figures out that they got mixed up?

Every scrambled file I encounter gets deleted and restored from backup, that's not a problem. The idea was to wipe them all and forget about the issue, but new cases like this one keep appearing. I can't do downtime and the plan was to locate and delete all corrupted files.

One more question - my metadata are replicated, does it mean that both original and replica are scrambled?

Re: IndblockBad appearing out of the blue

To clarify - indirect block is no different than on any other fs, right? I.e. it is a block attached to inode storing a list of all blocks used by this file? That would mean that data themselves are probably fine, but block locations get mixed up somehow? And how fsck figures out that they got mixed up?

Every scrambled file I encounter gets deleted and restored from backup, that's not a problem. The idea was to wipe them all and forget about the issue, but new cases like this one keep appearing. I can't do downtime and the plan was to locate and delete all corrupted files.

One more question - my metadata are replicated, does it mean that both original and replica are scrambled?

Well, forget about that. I just found legitimate scenario in which by fixing that I make it worse. Key to understanding what's going in is ind in IndblockBad - as it was pointed out it means indirect, not inode.

Re: IndblockBad appearing out of the blue

To clarify - indirect block is no different than on any other fs, right? I.e. it is a block attached to inode storing a list of all blocks used by this file? That would mean that data themselves are probably fine, but block locations get mixed up somehow? And how fsck figures out that they got mixed up?

Every scrambled file I encounter gets deleted and restored from backup, that's not a problem. The idea was to wipe them all and forget about the issue, but new cases like this one keep appearing. I can't do downtime and the plan was to locate and delete all corrupted files.

One more question - my metadata are replicated, does it mean that both original and replica are scrambled?

When GPFS finds one replica invalid, it reads the other one and tries to use that one. So if the seconds replica does not generate an FSSTRUCT error then you are fine. If you actually can modify the metadata, the new version will overwrite both copies fixing the bad one.

If you run the fsstruct scripts (AIX and Linux versions in /usr/lpp/mmfs/samples/debugtools) to decode the sense data, the line will show the two disk addresses for the indblock (repda). If only one of the disk addresses gets an fsstruct log record, then the other one is good.

Re: IndblockBad appearing out of the blue

When GPFS finds one replica invalid, it reads the other one and tries to use that one. So if the seconds replica does not generate an FSSTRUCT error then you are fine. If you actually can modify the metadata, the new version will overwrite both copies fixing the bad one.

If you run the fsstruct scripts (AIX and Linux versions in /usr/lpp/mmfs/samples/debugtools) to decode the sense data, the line will show the two disk addresses for the indblock (repda). If only one of the disk addresses gets an fsstruct log record, then the other one is good.

All that follows this point are my assumptions with no hard proof, so please correct me if I'm wrong anywhere.

Online mmfsck does only limited checks on list of blocks allocated to inodes (allocation keeps changing while it's running, so that's understandable) and files having blocks allocated twice (i.e. for two different inodes) slip through. I suspect that what it flags are inodes pointing at freed blocks. When I delete corrupted file I free some blocks, which may be pointed at by another inode (which is also corrupted, but not reported in such scenario). So, when deleting corrupted file, I must make sure that blocks for this file are not referenced anywhere else and, in case they are, all files referencing ones must be deleted. I guess offlne fsck does that for me, but bringing fs down is out of question currently.

And my questions:
1) is there a way to examine a block to say that it's freed/allocated/allocated to inodes X/Y/Z? The last option is probably too much to ask for, but free/allocated should be possible, as this is what gpfs logs as FSErrDeallocBlock error.
2) how to delete a file without freeing blocks it points to? That would stop freeing blocks allocated twice and I can catch lost blocks afterwards.
2) is offline mmfsck supposed to catch and fix such errors at all? Technically it's possible but is it implemented?