Let's assume there's a file /usr/schmidt/file1 that has been recently raided. In that case, the RaidNode will have crated a parity file /raid/usr/schmidt/file1

Now, imagine the user rewrites its original file and creates a new version of /usr/schmidt/file1.

The RaidNode correctly identifies the file has to be re-raided, creates the parity file in a temporary directory, but when it tries to move the newly created parity file from the temporary directory to /raid/usr/schmidt/file1, it fails because the destination already exists.

The patch verifies if the destination exists before renaming it and deletes the file in that case.

The unit test covers this scenario.

I actually found this bug when you proposed to add the unit test that covers this scenario on my patch to MAPREDUCE-1510. I didn't want to fix this bug there because MAPREDUCE-1510 was an improvement and not a bug, and (ii) neither the title or the description of MAPREDUCE-1510 covers this failure scenario and people looking for this problem would have a hard time finding the correct JIRA.

Rodrigo Schmidt
added a comment - 22/Feb/10 05:43 Sure, Dhruba!
Let's assume there's a file /usr/schmidt/file1 that has been recently raided. In that case, the RaidNode will have crated a parity file /raid/usr/schmidt/file1
Now, imagine the user rewrites its original file and creates a new version of /usr/schmidt/file1.
The RaidNode correctly identifies the file has to be re-raided, creates the parity file in a temporary directory, but when it tries to move the newly created parity file from the temporary directory to /raid/usr/schmidt/file1, it fails because the destination already exists.
The patch verifies if the destination exists before renaming it and deletes the file in that case.
The unit test covers this scenario.
I actually found this bug when you proposed to add the unit test that covers this scenario on my patch to MAPREDUCE-1510 . I didn't want to fix this bug there because MAPREDUCE-1510 was an improvement and not a bug, and (ii) neither the title or the description of MAPREDUCE-1510 covers this failure scenario and people looking for this problem would have a hard time finding the correct JIRA.

dhruba borthakur
added a comment - 22/Feb/10 05:47 This is confusing me. A HDFS.rename() should succeed even if the target file exists. The rename call deletes the original file and moves the new one to its place. Is this not what you are seeing?

Rodrigo Schmidt
added a comment - 22/Feb/10 05:50 Definitely not! Reason why I had to change the code to pass the unit test.
However, my original work was done in hadoop 0.20, not trunk. Let me check if it works on trunk without the changes to RaidNode.

Rodrigo Schmidt
added a comment - 22/Feb/10 05:58 Just checked and the new unit test doesn't work on trunk if we don't change the code for the RaidNode.
The unit test blocks and the logs start to report the following error when the RaidNode tries to move the parity file to the new location:
[junit] 10/02/21 21:56:15 INFO raid.RaidNode: Exception while invoking action on policy RaidTest1 srcPath hdfs://localhost:59030/user/dhruba/policytest exception java.io.IOException: Unable to rename tmp file hdfs://localhost:59030/destraid/user/dhruba/policytest/file2.tmp to hdfs://localhost:59030/destraid/user/dhruba/policytest/file2
[junit] at org.apache.hadoop.raid.RaidNode.generateParityFile(RaidNode.java:817)
[junit] at org.apache.hadoop.raid.RaidNode.doRaid(RaidNode.java:705)
[junit] at org.apache.hadoop.raid.RaidNode.doRaid(RaidNode.java:642)
[junit] at org.apache.hadoop.raid.RaidNode$TriggerMonitor.doProcess(RaidNode.java:401)
[junit] at org.apache.hadoop.raid.RaidNode$TriggerMonitor.run(RaidNode.java:307)
[junit] at java.lang.Thread.run(Thread.java:637)
Everything works fine with the proposed change.