Chapter 26. Manually Recovering File Split-brain

Run the following command to obtain the path of the file that is in split-brain:

# gluster volume heal VOLNAME info split-brain

From the command output, identify the files for which file operations performed from the client keep failing with Input/Output error.

Close the applications that opened split-brain file from the mount point. If you are using a virtual machine, you must power off the machine.

Obtain and verify the AFR changelog extended attributes of the file using the getfattr command. Then identify the type of split-brain to determine which of the bricks contains the 'good copy' of the file.

The extended attributes with trusted.afr.VOLNAMEvolname-client-<subvolume-index> are used by AFR to maintain changelog of the file. The values of the trusted.afr.VOLNAMEvolname-client-<subvolume-index> are calculated by the glusterFS client (FUSE or NFS-server) processes. When the glusterFS client modifies a file or directory, the client contacts each brick and updates the changelog extended attribute according to the response of the brick.

subvolume-index is the brick number - 1 of gluster volume info VOLNAME output.

These files do not have entries for themselves, only for the other bricks in the replica. For example, brick1 will only have trusted.afr.vol-client-1 set and brick2 will only have trusted.afr.vol-client-0 set. Interpreting the changelog remains same as explained below.

The same can be extended for other replica pairs.

Interpreting changelog (approximate pending operation count) value

Each extended attribute has a value which is 24 hexa decimal digits. First 8 digits represent changelog of data. Second 8 digits represent changelog of metadata. Last 8 digits represent Changelog of directory entries.

For directories, metadata and entry changelogs are valid. For regular files, data and metadata changelogs are valid. For special files like device files and so on, metadata changelog is valid. When a file split-brain happens it could be either be data split-brain or meta-data split-brain or both.

The following is an example of both data, metadata split-brain on the same file:

The changelog extended attributes on file /rhgs/brick1/a are as follows:

The first 8 digits of trusted.afr.vol-client-0 are all zeros (0x00000000................),

The first 8 digits of trusted.afr.vol-client-1 are not all zeros (0x000003d7................).

So the changelog on /rhgs/brick-a/a implies that some data operations succeeded on itself but failed on /rhgs/brick2/a.

The second 8 digits of trusted.afr.vol-client-0 are all zeros (0x........00000000........), and the second 8 digits of trusted.afr.vol-client-1 are not all zeros (0x........00000001........).

So the changelog on /rhgs/brick1/a implies that some metadata operations succeeded on itself but failed on /rhgs/brick2/a.

The changelog extended attributes on file /rhgs/brick2/a are as follows:

The first 8 digits of trusted.afr.vol-client-0 are not all zeros (0x000003b0................).

The first 8 digits of trusted.afr.vol-client-1 are all zeros (0x00000000................).

So the changelog on /rhgs/brick2/a implies that some data operations succeeded on itself but failed on /rhgs/brick1/a.

The second 8 digits of trusted.afr.vol-client-0 are not all zeros (0x........00000001........)

The second 8 digits of trusted.afr.vol-client-1 are all zeros (0x........00000000........).

So the changelog on /rhgs/brick2/a implies that some metadata operations succeeded on itself but failed on /rhgs/brick1/a.

Here, both the copies have data, metadata changes that are not on the other file. Hence, it is both data and metadata split-brain.

Deciding on the correct copy

You must inspect stat and getfattr output of the files to decide which metadata to retain and contents of the file to decide which data to retain. To continue with the example above, here, we are retaining the data of /rhgs/brick1/a and metadata of /rhgs/brick2/a.

Resetting the relevant changelogs to resolve the split-brain

Resolving data split-brain

You must change the changelog extended attributes on the files as if some data operations succeeded on /rhgs/brick1/a but failed on /rhgs/brick-b/a. But /rhgs/brick2/a should not have any changelog showing data operations succeeded on /rhgs/brick2/a but failed on /rhgs/brick1/a. You must reset the data part of the changelog on trusted.afr.vol-client-0 of /rhgs/brick2/a.

Resolving metadata split-brain

You must change the changelog extended attributes on the files as if some metadata operations succeeded on /rhgs/brick2/a but failed on /rhgs/brick1/a. But /rhgs/brick1/a should not have any changelog which says some metadata operations succeeded on /rhgs/brick1/a but failed on /rhgs/brick2/a. You must reset metadata part of the changelog on trusted.afr.vol-client-1 of /rhgs/brick1/a

Run the following commands to reset the extended attributes.

On /rhgs/brick2/a, for trusted.afr.vol-client-0 0x000003b00000000100000000 to 0x000000000000000100000000, execute the following command:

AFR has the ability to conservatively merge different entries in the directories when there is a split-brain on directory. If on one brick directory storage has entries 1, 2 and has entries 3, 4 on the other brick then AFR will merge all of the entries in the directory to have 1, 2, 3, 4 entries in the same directory. But this may result in deleted files to re-appear in case the split-brain happens because of deletion of files on the directory. Split-brain resolution needs human intervention when there is at least one entry which has same file name but different gfid in that directory.

For example:

On brick-a the directory has 2 entries file1 with gfid_x and file2 . On brick-b directory has 2 entries file1 with gfid_y and file3. Here the gfid's of file1 on the bricks are different. These kinds of directory split-brain needs human intervention to resolve the issue. You must remove either file1 on brick-a or the file1 on brick-b to resolve the split-brain.

In addition, the corresponding gfid-link file must be removed. The gfid-link files are present in the .glusterfs directory in the top-level directory of the brick. If the gfid of the file is 0x307a5c9efddd4e7c96e94fd4bcdcbd1b (the trusted.gfid extended attribute received from the getfattr command earlier), the gfid-link file can be found at /rhgs/brick1/.glusterfs/30/7a/307a5c9efddd4e7c96e94fd4bcdcbd1b.

Warning

Before deleting the gfid-link, you must ensure that there are no hard links to the file present on that brick. If hard-links exist, you must delete them.

Where did the comment section go?

Red Hat's documentation publication system recently went through an upgrade to enable speedier, more mobile-friendly content. We decided to re-evaluate our commenting platform to ensure that it meets your expectations and serves as an optimal feedback mechanism. During this redesign, we invite your input on providing feedback on Red Hat documentation via the discussion platform.