Tsz Wo Nicholas Sze
added a comment - 21/Jul/11 11:04
1) Can we keep BlockManager#corruptReplicas private?
After the replication code is moved the block manager, we can change it back to private.
I moved getNumberOfRacks(..) to BlockManagerTestUtil since it was used by tests only. So I changed corruptReplicas to package private. It is nothing to do with the replication code.

After the replication code is moved the block manager, we can change it back to private.

2) Is there a jira created to not to use FSNamesystem, FSNamesystem#datanodeMap FSNamesystem#heartbeats for locking in block manager? If not can you please create one?

I am current simply moving the code but not changing the logic (unless for some obvious cases, e.g. use for-each statement instead of iterator.)

For improving locking, I have thought about it. How about introduce a ReadWriteLock interface and then FSNamesystem, BlockManager and DatanodeManager implement it? The methods in ReadWriteLock are very similar to the existing methods in FSNamesystem. I have not created a JIRA for it yet.

3) Should we track with a jira removing references to NameNode.stateChangeLog in blockmanagement package to appropritate logger from blockmanagement package?

We probably should introduce a stateChangeLog in BlockManager.

4) Moved code resolveNetworkLocation() no longer has assert hasWriteLock(). What is the locking mechanism for this code?

FSNamesystem.registerDatanode(..), which has the write-lock, calls DatanodeManager.registerDatanode(..), which is the only method calling resolveNetworkLocation(..). I think the assert statement is not useful. No? We may add the assert statement back when we have read-write lock in DatanodeManager. Again, I am not changing any logic.

I am moving the code step-by-step. After this patch, we are ready to move FSNamesystem#datanodeMap and the corresponding methods. I have HDFS-2108 and HDFS-2112 for moving the heartbeat and replication code, respectively. I will see if moving blockTokenSecretManager could be done with replication.

6) getDatanodeListForReport()
In jiras moving the code, if the moved code remains the same, it would be easier to review. Case in point - getDatanodeListForReport()

My bad. I usually don't change any logic but there are too many redundant code there. I tried to simpify it.

Tsz Wo Nicholas Sze
added a comment - 21/Jul/11 10:36 1) Can we keep BlockManager#corruptReplicas private?
After the replication code is moved the block manager, we can change it back to private.
2) Is there a jira created to not to use FSNamesystem, FSNamesystem#datanodeMap FSNamesystem#heartbeats for locking in block manager? If not can you please create one?
I am current simply moving the code but not changing the logic (unless for some obvious cases, e.g. use for-each statement instead of iterator.)
For improving locking, I have thought about it. How about introduce a ReadWriteLock interface and then FSNamesystem, BlockManager and DatanodeManager implement it? The methods in ReadWriteLock are very similar to the existing methods in FSNamesystem. I have not created a JIRA for it yet.
3) Should we track with a jira removing references to NameNode.stateChangeLog in blockmanagement package to appropritate logger from blockmanagement package?
We probably should introduce a stateChangeLog in BlockManager.
4) Moved code resolveNetworkLocation() no longer has assert hasWriteLock(). What is the locking mechanism for this code?
FSNamesystem.registerDatanode(..), which has the write-lock, calls DatanodeManager.registerDatanode(..), which is the only method calling resolveNetworkLocation(..). I think the assert statement is not useful. No? We may add the assert statement back when we have read-write lock in DatanodeManager. Again, I am not changing any logic.
5) FSNamesystem#isDatanodeDead(), FSNamesystem#updateStats(), FSNamesystem#blockTokenSecretManager, FSNamesystem#datanodeMap, FSNamesystem#heartbeats should move to blockmanagement package.
I am moving the code step-by-step. After this patch, we are ready to move FSNamesystem#datanodeMap and the corresponding methods. I have HDFS-2108 and HDFS-2112 for moving the heartbeat and replication code, respectively. I will see if moving blockTokenSecretManager could be done with replication.
6) getDatanodeListForReport()
In jiras moving the code, if the moved code remains the same, it would be easier to review. Case in point - getDatanodeListForReport()
My bad. I usually don't change any logic but there are too many redundant code there. I tried to simpify it.
> Why are doing adding these two sizes final List<DatanodeDescriptor> nodes = new ArrayList<DatanodeDescriptor>(namesystem.datanodeMap.size() + (mustList == null? 0: mustList.size()));
This is existing code. I think the author tried to prevent array resizing.
> Changing mustList from Map to ArrayList is going to slow down removes.
Good point, I should use HashSet.
7) checkDecommissionStateInternal() no longer checks for hasWriteLock()?
Same as (4), it is an assert statement. Do you think we need it?
8) DatanodeManager should not have namesystem reference?
The main reason is for accessing namesystem.datanodeMap. We should remove namesystem once everything is moved to blockmanagement package.
9) In some tests where you have moved from FSNamesystem methods calls to BlockManager method calls, how is the locking that was done in FSNamesystem call done with this change?
It is because the same FSNamesystem is moved to BlockManager. I should not have changed any logic.

Suresh Srinivas
added a comment - 21/Jul/11 07:08
Can we keep BlockManager#corruptReplicas private?
Is there a jira created to not to use FSNamesystem, FSNamesystem#datanodeMap FSNamesystem#heartbeats for locking in block manager? If not can you please create one?
Should we track with a jira removing references to NameNode.stateChangeLog in blockmanagement package to appropritate logger from blockmanagement package?
Moved code resolveNetworkLocation() no longer has assert hasWriteLock() . What is the locking mechanism for this code?
FSNamesystem#isDatanodeDead(), FSNamesystem#updateStats(), FSNamesystem#blockTokenSecretManager, FSNamesystem#datanodeMap, FSNamesystem#heartbeats should move to blockmanagement package.
getDatanodeListForReport()
In jiras moving the code, if the moved code remains the same, it would be easier to review. Case in point - getDatanodeListForReport()
Why are doing adding these two sizes final List<DatanodeDescriptor> nodes = new ArrayList<DatanodeDescriptor>(namesystem.datanodeMap.size() + (mustList == null? 0: mustList.size()));
Changing mustList from Map to ArrayList is going to slow down removes.
checkDecommissionStateInternal() no longer checks for hasWriteLock()?
DatanodeManager should not have namesystem reference?
In some tests where you have moved from FSNamesystem methods calls to BlockManager method calls, how is the locking that was done in FSNamesystem call done with this change?

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

Hadoop QA
added a comment - 20/Jul/11 08:02 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12487020/h2167_20110719.patch
against trunk revision 1148348.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
+1 system test framework. The patch passed system test framework compile.
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/976//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/976//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/976//console
This message is automatically generated.