Nicholas is right. So there is no worry about deadlock. But it is still better that NN chooses the same primary datanode as the client because it is easier to detect concurrent pipeline recovery on the same block.

Hairong Kuang
added a comment - 25/Feb/09 18:04 Nicholas is right. So there is no worry about deadlock. But it is still better that NN chooses the same primary datanode as the client because it is easier to detect concurrent pipeline recovery on the same block.

> One side issue... DFSClient and NameNode used a different node as a primary datanode for pipeline recovery of the same block. That was why I did not see NameNode-initiated pipeline recovery at first. Will this cause the deadlock reported HADOOP-3673?

One of the necessary conditions for a deadlock is hold-and-wait. In NameNode initiated recovery, NameNode use the so called reverse-heartbeat to send the recovery command, which does not hold Datanode RPC thread. The hold-and-wait condition cannot be satisfied.

Tsz Wo Nicholas Sze
added a comment - 23/Feb/09 23:53 > One side issue... DFSClient and NameNode used a different node as a primary datanode for pipeline recovery of the same block. That was why I did not see NameNode-initiated pipeline recovery at first. Will this cause the deadlock reported HADOOP-3673 ?
One of the necessary conditions for a deadlock is hold-and-wait. In NameNode initiated recovery, NameNode use the so called reverse-heartbeat to send the recovery command, which does not hold Datanode RPC thread. The hold-and-wait condition cannot be satisfied.

One side issue... DFSClient and NameNode used a different node as a primary datanode for pipeline recovery of the same block. That was why I did not see NameNode-initiated pipeline recovery at first. Will this cause the deadlock reported HADOOP-3673?

Hairong Kuang
added a comment - 23/Feb/09 23:40 One side issue... DFSClient and NameNode used a different node as a primary datanode for pipeline recovery of the same block. That was why I did not see NameNode-initiated pipeline recovery at first. Will this cause the deadlock reported HADOOP-3673 ?

Tsz Wo Nicholas Sze
added a comment - 23/Feb/09 22:01 It seems that the DFSClient somehow lost contact to the NameNode for at least an hour so that the lease expired and the NameNode started a lease recovery.

After I read the NameNode log, I realized that my conclusion in the previous comment was wrong. It turned out both NameNode and DFSClient initiated the pipeline recovery. NameNode started earlier and the pipeline recovery succeeded. The log shown above was from the NN initiated pipeline recovery. As a result, the block's generation stamp were bumped up in datanodes and in NN's memory. When DFSClient initiated the pipeline recovery, it still used the old generation stamp so the recovery failed with the stack trace shown in the description. But DFSClient kept on trying forever.

Hairong Kuang
added a comment - 23/Feb/09 21:49 After I read the NameNode log, I realized that my conclusion in the previous comment was wrong. It turned out both NameNode and DFSClient initiated the pipeline recovery. NameNode started earlier and the pipeline recovery succeeded. The log shown above was from the NN initiated pipeline recovery. As a result, the block's generation stamp were bumped up in datanodes and in NN's memory. When DFSClient initiated the pipeline recovery, it still used the old generation stamp so the recovery failed with the stack trace shown in the description. But DFSClient kept on trying forever.

Here is more information regarding this failure. The log came from the same primary datanode:
org.apache.hadoop.hdfs.server.datanode.DataNode: oldblock=blk_1415000632081498137_954380(length=31016448), newblock=blk_1415000632081498137_989001(length=31016448), datanode=XX
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_1415000632081498137_989001 of size 31016448 as part of lease recovery.

It looks that pipeline recovery succeeded at the primary datanode. The new generation stamp is 989001. But the client saw this recovery as a failure and used the old generation stamp 954380 to recover the error:
WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_1415000632081498137_954380 failed because recovery from primary datanode XX failed 4 times. Will retry...

Hairong Kuang
added a comment - 23/Feb/09 21:04 Here is more information regarding this failure. The log came from the same primary datanode:
org.apache.hadoop.hdfs.server.datanode.DataNode: oldblock=blk_1415000632081498137_954380(length=31016448), newblock=blk_1415000632081498137_989001(length=31016448), datanode=XX
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_1415000632081498137_989001 of size 31016448 as part of lease recovery.
It looks that pipeline recovery succeeded at the primary datanode. The new generation stamp is 989001. But the client saw this recovery as a failure and used the old generation stamp 954380 to recover the error:
WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_1415000632081498137_954380 failed because recovery from primary datanode XX failed 4 times. Will retry...
This retry went on forever.