L518 & L276
I see that you create a single editLogStream at the start and have startLogSegment change it in the background. This is very side-effecty. How about having startLogSegment return an EditLogOutputStream and assigning it to editLogStream? Likewise on like L536, just call editLogStream.close(); and have your implementation of the stream interface call finalize for the stream? It would match better with L556 also.

L409
Are we losing information here, previously we'd get numbers for each journal, now we only get one?

Ivan Kelly
added a comment - 21/Sep/11 18:02 Looks like a pretty straight forward refactor. A few comments.
L518 & L276
I see that you create a single editLogStream at the start and have startLogSegment change it in the background. This is very side-effecty. How about having startLogSegment return an EditLogOutputStream and assigning it to editLogStream? Likewise on like L536, just call editLogStream.close(); and have your implementation of the stream interface call finalize for the stream? It would match better with L556 also.
L409
Are we losing information here, previously we'd get numbers for each journal, now we only get one?
L1099 & L1086
Looks like a debugging statement.

> L518 & L276
Fixed to store editLogStream in startLogSegment.
> L536
The finalizeLogSegments takes parameters which are not available in close method. I am making sure that finalizeLogSegments, abort and close, close the underlying streams. Therefore, editLogStream in FSEditLog can be set to null after these calls.

Jitendra Nath Pandey
added a comment - 23/Sep/11 08:00 > L518 & L276
Fixed to store editLogStream in startLogSegment.
> L536
The finalizeLogSegments takes parameters which are not available in close method. I am making sure that finalizeLogSegments, abort and close, close the underlying streams. Therefore, editLogStream in FSEditLog can be set to null after these calls.
>L409, L1099 & L1086
Good catch. Fixed

You've added a new log statement in EditLogFileOutputStream#flushAndSync. Shouldn't it be a LOG.warn()? There should be any instance of calling flush without data having been written to the stream first. The only case it could happen is when the stream has errored on a previous write, but it should have been removed from JournalSetOutputStream at that point. In fact, Im not sure how this worked in the old code either.

In FSEditLog#close you removed the assert for !journals.isEmpty. Instead you could put assert editLogStream != null;

In FSEditLog#logEdit, you catch the IOException, log and continue as normal. I can't see where the error is handled later. I don't think editLogStream.setReadyToFlush() will throw an exception if no active journals are found.

JournalSet#getSyncTimes doesn't return anything, so it's not really a getter. I think this would be cleaner if getSyncTimes() just returned a String rather than taking in a StringBuilder.

The finalizeLogSegments takes parameters which are not available in close method.

The last txid should be available to stream, as it must have passed through write(). Same for first tx id. In any case, it's not that important to me, just something I thought would make things nicer.

Ivan Kelly
added a comment - 26/Sep/11 10:44 Why not get rid of journal stream completely? Perhaps we can do this in a later JIRA.
Still have a findbug warning because of the custom synchronization in logSync.
<!--
synchronization is manually done in logSync()
//-->
<Match>
<Class name="org.apache.hadoop.hdfs.server.namenode.FSEditLog" />
<Field name="editLogStream" />
<Bug pattern="IS2_INCONSISTENT_SYNC" />
</Match>
You've added a new log statement in EditLogFileOutputStream#flushAndSync. Shouldn't it be a LOG.warn()? There should be any instance of calling flush without data having been written to the stream first. The only case it could happen is when the stream has errored on a previous write, but it should have been removed from JournalSetOutputStream at that point. In fact, Im not sure how this worked in the old code either.
In FSEditLog#close you removed the assert for !journals.isEmpty. Instead you could put assert editLogStream != null;
In FSEditLog#logEdit, you catch the IOException, log and continue as normal. I can't see where the error is handled later. I don't think editLogStream.setReadyToFlush() will throw an exception if no active journals are found.
JournalSet#getSyncTimes doesn't return anything, so it's not really a getter. I think this would be cleaner if getSyncTimes() just returned a String rather than taking in a StringBuilder.
The finalizeLogSegments takes parameters which are not available in close method.
The last txid should be available to stream, as it must have passed through write(). Same for first tx id. In any case, it's not that important to me, just something I thought would make things nicer.

You've added a new log statement in EditLogFileOutputStream#flushAndSync. Shouldn't it be a LOG.warn()? There should be any instance of calling flush without data having been written to the stream first. The only case it could happen is when the stream has errored on a previous write, but it should have been removed from JournalSetOutputStream at that point. In fact, Im not sure how this worked in the old code either.

It can also happen if multiple threads are doing flushAndSync?

The last txid should be available to stream, as it must have passed through write(). Same for first tx id. In any case, it's not that important to me, just something I thought would make things nicer.

To me both approaches seem equivalent. If it is not critical, can we take it up separately later as an improvement?

I have addressed other comments. I haven't excluded the findbugs warning, instead am storing the editLogStream in a local variable with the lock held. That also guards against an NPE if editLogStream is set to null by another thread before flush.

Jitendra Nath Pandey
added a comment - 28/Sep/11 02:29
You've added a new log statement in EditLogFileOutputStream#flushAndSync. Shouldn't it be a LOG.warn()? There should be any instance of calling flush without data having been written to the stream first. The only case it could happen is when the stream has errored on a previous write, but it should have been removed from JournalSetOutputStream at that point. In fact, Im not sure how this worked in the old code either.
It can also happen if multiple threads are doing flushAndSync?
The last txid should be available to stream, as it must have passed through write(). Same for first tx id. In any case, it's not that important to me, just something I thought would make things nicer.
To me both approaches seem equivalent. If it is not critical, can we take it up separately later as an improvement?
I have addressed other comments. I haven't excluded the findbugs warning, instead am storing the editLogStream in a local variable with the lock held. That also guards against an NPE if editLogStream is set to null by another thread before flush.

Hadoop QA
added a comment - 28/Sep/11 03:36 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12496826/HDFS-2158.10.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 9 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hdfs.TestDfsOverAvroRpc
org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap
org.apache.hadoop.hdfs.server.datanode.TestReplicasMap
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1307//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1307//console
This message is automatically generated.

Suresh Srinivas
added a comment - 28/Sep/11 23:08 Comments (some are directly not related to this changes in this patch):
JournalSet.java
#remove() #getEditLogManifest() should not throw IOException
Add javadoc comments to the methods where it is missing
Why do you need method level @VisibleForTesting when you have it the class level?
JournalClosure - add /** */ class javadoc to class and method
JounrnalSetOutputStream - add class javadoc
JournalSet#getOuputStream() is not used any where
JournalManager please update the class javadoc, because with JournalSet it no longer holds good.
FSEditLog.java
My preference is to leave import of FSEditLogOp.*
Remove commented NO_JOURNAL_STREAMS_WARNING if it not required
FSEditLog constructor must not throw IOException
logSyncAll() should not throw IOException
logGetDelgationToken() remove @return
make FSEditLog#journalSet final
logSyn() has some indentation issues
You can get rid of log of LOG.error("All journal have failed ...") by putting it is in mapJournalsAndReportErrors()
EditLogBackupOutputStream - remove LOG.info() which is not necessary
EditLogOutputStream.java - has many unused imports and consturctor should not throw IOException
TestStorageRestore.java - remove unused Writable import
Unrelated to this patch, to address in a different jira:
JournalManager.java
Jounrnal manager class javadoc should mention that txid are sequential and no gaps are expected.
startLogSegment() - better name would be openNewLogSegment(), finalizeLogSegment could be closeLogSegment
purgeLogsOlderThan() the method javadoc says minImageTxId instead of minTxIdToKeep
Please more comments to recoverFinalizedSegments() - It should mention details of recovery scenario
"is cannot be" to "cannot be"

Hadoop QA
added a comment - 30/Sep/11 01:34 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12497063/HDFS-2158.11.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 9 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hdfs.TestDfsOverAvroRpc
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1317//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1317//console
This message is automatically generated.