Arpit Agarwal , sorry for the late code review as this comment holds good more for the JIRA HDFS-11149. Thanks for the effort for this change. In DatasetVolumeChecker the checkAllVolumes and checkAllVolumesAsync does the same tasks except for few additional things in checkAllVolumes. The code can be modified such that checkAllVolumes inturn calls the checkAllVolumesAsync and save lines of code and also make sure that both does the same intended work without code duplication in future as well.

Manjunath Anand
added a comment - 03/Dec/16 10:55 - edited Arpit Agarwal , sorry for the late code review as this comment holds good more for the JIRA HDFS-11149 . Thanks for the effort for this change. In DatasetVolumeChecker the checkAllVolumes and checkAllVolumesAsync does the same tasks except for few additional things in checkAllVolumes. The code can be modified such that checkAllVolumes inturn calls the checkAllVolumesAsync and save lines of code and also make sure that both does the same intended work without code duplication in future as well.
public ResultHandler checkAllVolumesAsync(
final FsDatasetSpi<? extends FsVolumeSpi> dataset,
Callback callback) {
if (timer.monotonicNow() - lastAllVolumesCheck < minDiskCheckGapMs) {
numSkippedChecks.incrementAndGet();
return null ;
}
lastAllVolumesCheck = timer.monotonicNow();
final Set<StorageLocation> healthyVolumes = new HashSet<>();
final Set<StorageLocation> failedVolumes = new HashSet<>();
final Set<StorageLocation> allVolumes = new HashSet<>();
final FsDatasetSpi.FsVolumeReferences references =
dataset.getFsVolumeReferences();
final CountDownLatch latch = new CountDownLatch(references.size());
LOG.info( "Checking {} volumes" , references.size());
for ( int i = 0; i < references.size(); ++i) {
final FsVolumeReference reference = references.getReference(i);
allVolumes.add(reference.getVolume().getStorageLocation());
// The context parameter is currently ignored.
ListenableFuture<VolumeCheckResult> future =
delegateChecker.schedule(reference.getVolume(), IGNORED_CONTEXT);
LOG.info( "Scheduled health check for volume {}" , reference.getVolume());
Futures.addCallback( future , new ResultHandler(
reference, healthyVolumes, failedVolumes, latch, callback));
}
numAsyncDatasetChecks.incrementAndGet();
return new ResultHandler(
null , healthyVolumes, failedVolumes, latch, callback);
}
public Set<StorageLocation> checkAllVolumes(
final FsDatasetSpi<? extends FsVolumeSpi> dataset)
throws InterruptedException {
ResultHandler resultHandler = checkAllVolumesAsync(dataset, null );
if (resultHandler == null ) {
return Collections.emptySet();
}
// Wait until our timeout elapses, after which we give up on
// the remaining volumes.
if (!resultHandler.getResultsLatch().await(maxAllowedTimeForCheckMs, TimeUnit.MILLISECONDS)) {
LOG.warn( "checkAllVolumes timed out after {} ms" +
maxAllowedTimeForCheckMs);
}
numSyncDatasetChecks.incrementAndGet();
synchronized ( this ) {
// All volumes that have not been detected as healthy should be
// considered failed. This is a superset of 'failedVolumes'.
//
// Make a copy under the mutex as Sets.difference() returns a view
// of a potentially changing set.
return new HashSet<>(Sets.difference(resultHandler.getAllVolumes(), resultHandler.getHealthyVolumes()));
}
}
Although this is not a great thing but it will save duplicate code and differences of core behaviour between the 2 methods which is currently present as well in terms of some loggers eg:-
LOG.info( "Scheduled health check for volume {}" , reference.getVolume());
this is there only in checkAllVolumes where as
LOG.info( "Checking {} volumes" , references.size());
is present only in checkAllVolumesAsync. Please let me know your thoughts on this.

Arpit Agarwal I am little confused about the reason for the change in DatasetVolumeChecker.checkAllVolume methods to use Semaphore and CoutDownLatch instead of the earlier approach of using CountDownLatch.

The confusion is whether the CountDownLatch countdown is correctly called after all threads scheduled for completing the volume check finish their jobs. I am presenting my understanding as to why the CountDownLatch countdown is called by the first thread completing the volume check which would result in incorrect expected behaviour. Please correct me if I am wrong.

The below code acquires all references.size()-1 permits .

final Semaphore semaphore = new Semaphore(-references.size() + 1);

The invokeCallback releases permit in Semaphore and then does a tryAcquire which would allow it to aquire the permit it just released and hence would go and call the callback.call method which would release the latch by the countDown.

So it looks like the first thread which completed the volume check will release the latch which would mean we havent waited for maxAllowedTimeForCheckMs by the await call made on the latch.

Manjunath Anand
added a comment - 03/Dec/16 15:14 - edited Arpit Agarwal I am little confused about the reason for the change in DatasetVolumeChecker.checkAllVolume methods to use Semaphore and CoutDownLatch instead of the earlier approach of using CountDownLatch.
The confusion is whether the CountDownLatch countdown is correctly called after all threads scheduled for completing the volume check finish their jobs. I am presenting my understanding as to why the CountDownLatch countdown is called by the first thread completing the volume check which would result in incorrect expected behaviour. Please correct me if I am wrong.
The below code acquires all references.size()-1 permits .
final Semaphore semaphore = new Semaphore(-references.size() + 1);
The invokeCallback releases permit in Semaphore and then does a tryAcquire which would allow it to aquire the permit it just released and hence would go and call the callback.call method which would release the latch by the countDown.
So it looks like the first thread which completed the volume check will release the latch which would mean we havent waited for maxAllowedTimeForCheckMs by the await call made on the latch.
private void invokeCallback() {
try {
semaphore.release();
if (callback != null && semaphore.tryAcquire()) {
callback.call(healthyVolumes, failedVolumes);
}
} catch (Exception e) {
// Propagating this exception is unlikely to be helpful.
LOG.warn( "Unexpected exception" , e);
}
}
Futures.addCallback( future , new ResultHandler(
- reference, healthyVolumes, failedVolumes, resultsLatch, null )); + reference, healthyVolumes, failedVolumes, semaphore, new Callback() {
+ @Override
+ public void call(Set<StorageLocation> ignored1,
+ Set<StorageLocation> ignored2) {
+ latch.countDown();
+ }
+ }));
Please suggest on this.

The invokeCallback releases permit in Semaphore and then does a tryAcquire which would allow it to aquire the permit it just released and hence would go and call the callback.call method which would release the latch by the countDown.

Since the number of permits is initially negative, it will take references.size() releases before any acquire call succeeds. So the first references.size() -1 threads will not be able to acquire the permit they just released.

Arpit Agarwal
added a comment - 11/Dec/16 22:53 Hi Manjunath Anand ,
The invokeCallback releases permit in Semaphore and then does a tryAcquire which would allow it to aquire the permit it just released and hence would go and call the callback.call method which would release the latch by the countDown.
Since the number of permits is initially negative, it will take references.size() releases before any acquire call succeeds. So the first references.size() -1 threads will not be able to acquire the permit they just released.

CountDownLatch#countDown returns no value so there is no easy way to detect when the count falls to zero and the callback can be invoked (it must be invoked once only). I was using an AtomicLong to detect the 0->1 transition but it had a bug.

The semaphore approach fixes it. We still need the CountDownLatch which we can use as an event. I could have used an Object mutex instead but that would have required extra code to deal with the spurious wakeup problem which CountDownLatch does not suffer from.

ASF GitHub Bot
added a comment - 19/Dec/16 22:04 Github user arp7 commented on a diff in the pull request:
https://github.com/apache/hadoop/pull/168#discussion_r93129689
— Diff: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/checker/DatasetVolumeChecker.java —
@@ -299,26 +327,35 @@ public void checkVolume(
private class ResultHandler
implements FutureCallback<VolumeCheckResult> {
private final FsVolumeReference reference;
private final Set<StorageLocation> failedVolumes;
private final Set<StorageLocation> healthyVolumes;
private final CountDownLatch latch;
private final AtomicLong numVolumes;
+ private final Set<FsVolumeSpi> failedVolumes;
+ private final Set<FsVolumeSpi> healthyVolumes;
+ private final Semaphore semaphore;
@Nullable
private final Callback callback;
+ /**
+ *
+ * @param reference FsVolumeReference to be released when the check is
+ * complete.
+ * @param healthyVolumes set of healthy volumes. If the disk check is
+ * successful, add the volume here.
+ * @param failedVolumes set of failed volumes. If the disk check fails,
+ * add the volume here.
+ * @param semaphore semaphore used to trigger callback invocation.
— End diff –
CountDownLatch#countDown returns no value so there is no easy way to detect when the count falls to zero and the callback can be invoked (it must be invoked once only). I was using an AtomicLong to detect the 0->1 transition but it had a bug.
The semaphore approach fixes it. We still need the CountDownLatch which we can use as an event. I could have used an Object mutex instead but that would have required extra code to deal with the spurious wakeup problem which CountDownLatch does not suffer from.

Arpit Agarwal
added a comment - 19/Dec/16 22:54 - edited Thanks for the feecback Xiaoyu Yao !
Pushed the following commit:
https://github.com/apache/hadoop/pull/168/commits/7ef55e7b2622de9345dd2d138c3e828ad5916bfa
Removed the semaphore as both you and Manjunath Anand commented on it, replaced with a simple counter initialized to the number of volumes.
Factored out some common code in tests you pointed out into a @Before method.
Manjunath Anand , to address your other point about shared code between the sync and async checks, there's a couple of differences that make it harder to share code.

The Datanode#storageLocationChecker is only needed during the datanode startup. We don't need to pass it as a parameter to DataNode constructor and keep it running during the lifetime of the datanode until datanode shutdown. This can be done as an optimization later.

ASF GitHub Bot
added a comment - 20/Dec/16 03:19 Github user xiaoyuyao commented on a diff in the pull request:
https://github.com/apache/hadoop/pull/168#discussion_r93165428
— Diff: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java —
@@ -1944,6 +1935,8 @@ public void shutdown() {
}
}
+ volumeChecker.shutdownAndWait(1, TimeUnit.SECONDS);
+
if (storageLocationChecker != null) {
— End diff –
The Datanode#storageLocationChecker is only needed during the datanode startup. We don't need to pass it as a parameter to DataNode constructor and keep it running during the lifetime of the datanode until datanode shutdown. This can be done as an optimization later.

By the way regarding the reuse, I really wanted to do that too but it's non-trivial because the handling logic is different in both paths. It probably should have never been made different but reconciling them now is a bit of work. We can look at it in a separate Jira.

ASF GitHub Bot
added a comment - 20/Dec/16 04:13 Github user arp7 commented on a diff in the pull request:
https://github.com/apache/hadoop/pull/168#discussion_r93169191
— Diff: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java —
@@ -1944,6 +1935,8 @@ public void shutdown() {
}
}
+ volumeChecker.shutdownAndWait(1, TimeUnit.SECONDS);
+
if (storageLocationChecker != null) {
— End diff –
By the way regarding the reuse, I really wanted to do that too but it's non-trivial because the handling logic is different in both paths. It probably should have never been made different but reconciling them now is a bit of work. We can look at it in a separate Jira.

ASF GitHub Bot
added a comment - 20/Dec/16 04:25 Github user arp7 commented on the issue:
https://github.com/apache/hadoop/pull/168
@xiaoyuyao I pushed one more commit to improve the logging. Now we log at warn if there is a volume failure and at debug if there is no failure.

+1 for the latest patch. I tried the failed Jenkins test but can't repro it.
I plan to commit it shortly with "git apply --whitespace=fix".
The checkstyle indentation level issue with Java8 Lambdas seems to be a known issue. Will open separate ticket to fix that.

Xiaoyu Yao
added a comment - 20/Dec/16 21:50 +1 for the latest patch. I tried the failed Jenkins test but can't repro it.
I plan to commit it shortly with "git apply --whitespace=fix".
The checkstyle indentation level issue with Java8 Lambdas seems to be a known issue. Will open separate ticket to fix that.

Attached a patch for branch-2. Xiaoyu Yao, can you please take a look at the patch. The main difference from trunk is that in branch-2 DatasetVolumeChecker returns Set<FsVolumeSpi> whereas in trunk it returns Set<StorageLocation>.

The difference is that due to branch divergence, we cannot get the StorageLocation from an FsVolumeSpi trivially in branch-2. I ran test-patch locally and results are below. Also tested the branch-2 patch by simulating disk failure and verifying that the failure was caught and the volume was removed.