Konstantin, It would be great if you could take a look at the comments in HDFS-396. You would perhaps see that it has been tested and has even been through hudson. The checks that failed in hudson have been justified. Perhaps the problem is that tests don't cover enough?

Flavio Junqueira
added a comment - 01/Jul/09 07:59 Konstantin, It would be great if you could take a look at the comments in HDFS-396 . You would perhaps see that it has been tested and has even been through hudson. The checks that failed in hudson have been justified. Perhaps the problem is that tests don't cover enough?

Mahadev, yes, tests are probably failing on Windows since the patch has been tested thoroughly under Linux and MacOsX.
I have requested a virtual machine with windows installed, since at the moment I don't have any windows machine available.

Konstantin, I'd like to understand better what you mean with your #1. Do you want to assign root "/" as a Storage Directory?
Also, I would remove #4 from your comments, I think it's unfair, incorrect, and not useful to the issue. I assume that a developer is not requested to have any possible environment ready to test the patch locally; instead, the testing system (hudson?) should take care of it.

Luca Telloli
added a comment - 01/Jul/09 18:18 Mahadev, yes, tests are probably failing on Windows since the patch has been tested thoroughly under Linux and MacOsX.
I have requested a virtual machine with windows installed, since at the moment I don't have any windows machine available.
Konstantin, I'd like to understand better what you mean with your #1. Do you want to assign root "/" as a Storage Directory?
Also, I would remove #4 from your comments, I think it's unfair, incorrect, and not useful to the issue. I assume that a developer is not requested to have any possible environment ready to test the patch locally; instead, the testing system (hudson?) should take care of it.

Luca, (1) in Konstantin's comment is mostly not related to HDFS-356. I think tests were done for the patch but it is a known issue that we don't test for windows but generally would like to fix windows bugs when reported.

I think you can test this on Linux, but by passing in a Windows path for namenode directory ("C:\test"). It may not be necessary for you find a windows machine.

Mostly "C:" is making is an illegal URI. We need to have a fix for windows paths :
a. Does URI class or utilities help in handling windows path? Otherwise, we need treat windows paths specially.. something like what firefox does ("c:/" in url bar is an error, but "c:\" gets mapped to "file:///c:/").

Some time back we used to ask ourselves "What would windows do?" question more often. For this important change, we didn't. This is just another example of how there is "simple change"... there is a tough balance between how much we want to ask a contributor to tweak, test, and refine a patch and making it smooth and simple to contribute. just my 2 cents.

Raghu Angadi
added a comment - 01/Jul/09 19:18
Luca, (1) in Konstantin's comment is mostly not related to HDFS-356 . I think tests were done for the patch but it is a known issue that we don't test for windows but generally would like to fix windows bugs when reported.
I think you can test this on Linux, but by passing in a Windows path for namenode directory ("C:\test"). It may not be necessary for you find a windows machine.
Mostly "C:" is making is an illegal URI. We need to have a fix for windows paths :
a. Does URI class or utilities help in handling windows path? Otherwise, we need treat windows paths specially.. something like what firefox does ("c:/" in url bar is an error, but "c:\" gets mapped to "file:///c:/").
Some time back we used to ask ourselves "What would windows do?" question more often. For this important change, we didn't. This is just another example of how there is "simple change"... there is a tough balance between how much we want to ask a contributor to tweak, test, and refine a patch and making it smooth and simple to contribute. just my 2 cents.

Tsz Wo Nicholas Sze
added a comment - 01/Jul/09 19:34 > ... we need treat windows paths specially.. something like what firefox does ("c:/" in url bar is an error, but "c:\" gets mapped to "file:///c:/").
+1
I think converting Windows path c:\foo to URI file:///c:/foo is a standard but not specific to firefox.
BTW, since we assume Cygwin in Windows, could we just use /cygdrive/c/foo?

Raghu is right, my comment #1 an #3 are not related to the edits patch. This should be taken care in a different jira.
I ran tests on Linux and did not see failures, except for TestBackupNode, which is a known issue.
Please do not forget to past unit test and test-patch results before committing is Hudson is off.
I did not look deep into the problem, but I thought we are calling getCanonicalPath() to read edits files, which adds the drive letter. So may be the solution should be to call toURI() on the path before anything else. This should convert the path (even if it is a win path) to correct uri, but if the path is already a URI it will remain the same. Then we can start analyzing the scheme and the authority.

Konstantin Shvachko
added a comment - 02/Jul/09 23:37 Raghu is right, my comment #1 an #3 are not related to the edits patch. This should be taken care in a different jira.
I ran tests on Linux and did not see failures, except for TestBackupNode, which is a known issue.
Please do not forget to past unit test and test-patch results before committing is Hudson is off.
I did not look deep into the problem, but I thought we are calling getCanonicalPath() to read edits files, which adds the drive letter. So may be the solution should be to call toURI() on the path before anything else. This should convert the path (even if it is a win path) to correct uri, but if the path is already a URI it will remain the same. Then we can start analyzing the scheme and the authority.

Konstantin Shvachko
added a comment - 03/Jul/09 01:28 In FSNamesystem.getNamespaceDirs() I replaced
- u = new URI( "file: //" + new File(name).getAbsolutePath());
+ u = new File(name).getAbsoluteFile().toURI();
And the name-node started well.
Some additional comments:
getCanonicalFile() is preferrable to getAbsoluteFile() since the former replaces . and .. and resolves links.
You should use LOG.error(message, exception) instead of LOG.error(message + exception.getMessage()) . I saw this in 4 places.
When file directories are specified as files rather than URIs the warning should explicitly say that that this api is deprecated . Instead of
LOG.warn( "Scheme is undefined for " + name);
LOG.warn( "Please check your file system configuration in " +
"hdfs-site.xml" );
I'd rather say
LOG.warn( "Use of file paths for " + propertyName + " is deprecated. Instead URIs instead." );
The key words is deprecated, so that we could replace it with an error in next release, and URI so that users new what to do.

Luca Telloli
added a comment - 05/Jul/09 14:27 Attaching a patch for this issue.
All URI are now built using
new File(name).getCanonicalPath().toURI()
as Konstantin suggested.
LOG.error messages have been updated
from the messages above I understand that IllegalArgumentException is not an issue for this patch, correct?
I see the queue for HDFS on hudson is not working, so I won't submit the patch. I'm running the test locally and I'll post the output afterwards.

This prints two separate lines. Could you merge it? We usually have one for for that entire message so that each line is self contained.. makes parsing and grepping easy (stack traces are an exception of course). Also it might be better to suggest users to use URI in the message.

Raghu Angadi
added a comment - 06/Jul/09 06:33 Thanks for the fix.
Minor change :
+ u = new File(name).getCanonicalFile().toURI();
+ LOG.warn("Use of file paths is deprecated for property "
+ + propertyName);
+ LOG.warn("Please update your configuration");
This prints two separate lines. Could you merge it? We usually have one for for that entire message so that each line is self contained.. makes parsing and grepping easy (stack traces are an exception of course). Also it might be better to suggest users to use URI in the message.

Luca Telloli
added a comment - 06/Jul/09 18:53 Unit tests seem to have issues under Windows, and the issues don't seem all related to this patch. To prove that, I reverted HDFS-396 on a fresh trunk and the same tests fail in there.
Given that, I'm attaching a patch that has some Windows specific code which solve some unit tests issues but not all of them. In particular, the windows specific code looks like:
if(name.matches("^[a-zA-Z]:.*")){
u = new File(name).getCanonicalFile().toURI();
} else {
// process value as URI
u = new URI(name);
}
where the condition checks for a windows filename
The other unit test problems will be treated in HDFS-462

[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 3 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
[exec]

I run the tests on windows, using trunk + this patch and trunk + reverted 396, to see whether failures are due to this patch or not. I fixed a couple of error in the patch that I'm now posting.
To find some additional issues I run each test both on a directory containing spaces in path (like c:\Documents and Settings) and a directory not containing spaces, which apparently made the difference.

The impression is that windows builds have been neglected since some time, due to the fact that some tests fail on both pre and post patch code.

The outcome is that the remaining failing tests (which fail only on windows) are not related to this patch. Details follow for public discussion:

TestReplication does not fail for me (using trunk). Could you please check this one.
TestHDFSServerPorts and TestHDFSTrash do fail on Windows only. We should file an issue if there isn't one already.
TestModTime started failing when I updated trunk, was not failing on 2 days old trunk, when you were running your tests. Don't know what it could be related to.

Konstantin Shvachko
added a comment - 17/Jul/09 03:21 TestReplication does not fail for me (using trunk). Could you please check this one.
TestHDFSServerPorts and TestHDFSTrash do fail on Windows only. We should file an issue if there isn't one already.
TestModTime started failing when I updated trunk, was not failing on 2 days old trunk, when you were running your tests. Don't know what it could be related to.

I tried these 3 configurations:
1 - trunk
2 - trunk with HDFS-456 applied
3 - trunk with HDFS-396 reverted. 396 doesn't revert fully anymore but the problem is related to CreateEditLog.java, so not relevant to this issue

My outcome is:
1 - TestReplication fails with messages such as, which I guess is expected:

Testcase: testPendingReplicationRetry took 0.469 sec
Caused an ERROR
All specified directories are not accessible or do not exist.
java.io.IOException: All specified directories are not accessible or do not exist.

Testcase: testReplicateLenMismatchedBlock took 2.375 sec
Caused an ERROR
The requested operation cannot be performed on a file with a user-mapped section open
java.io.IOException: The requested operation cannot be performed on a file with a user-mapped section open
at java.io.RandomAccessFile.setLength(Native Method)
at org.apache.hadoop.hdfs.TestDatanodeBlockScanner.changeReplicaLength(TestDatanodeBlockScanner.java:419)
at org.apache.hadoop.hdfs.TestReplication.changeBlockLen(TestReplication.java:424)
at org.apache.hadoop.hdfs.TestReplication.testReplicateLenMismatchedBlock(TestReplication.java:403)

Luca Telloli
added a comment - 17/Jul/09 11:31 Today I've got different results from yours, I'm not sure if it's related to my environment configuration.
I tried these 3 configurations:
1 - trunk
2 - trunk with HDFS-456 applied
3 - trunk with HDFS-396 reverted. 396 doesn't revert fully anymore but the problem is related to CreateEditLog.java, so not relevant to this issue
My outcome is:
1 - TestReplication fails with messages such as, which I guess is expected:
Testcase: testPendingReplicationRetry took 0.469 sec
Caused an ERROR
All specified directories are not accessible or do not exist.
java.io.IOException: All specified directories are not accessible or do not exist.
2 - TestReplication failed on testReplicateLenMismatchedBlock with error:
Testcase: testReplicateLenMismatchedBlock took 2.375 sec
Caused an ERROR
The requested operation cannot be performed on a file with a user-mapped section open
java.io.IOException: The requested operation cannot be performed on a file with a user-mapped section open
at java.io.RandomAccessFile.setLength(Native Method)
at org.apache.hadoop.hdfs.TestDatanodeBlockScanner.changeReplicaLength(TestDatanodeBlockScanner.java:419)
at org.apache.hadoop.hdfs.TestReplication.changeBlockLen(TestReplication.java:424)
at org.apache.hadoop.hdfs.TestReplication.testReplicateLenMismatchedBlock(TestReplication.java:403)
3 - TestReplication passed correctly
luca@yahoo-a170c7579 /tmp/patches/trunk-applied
$ tail -n 6 build/test/TEST-org.apache.hadoop.hdfs.TestReplication.txt
Testcase: testBadBlockReportOnTransfer took 24.61 sec
Testcase: testReplicationSimulatedStorag took 3.562 sec
Testcase: testReplication took 9.359 sec
Testcase: testPendingReplicationRetry took 23.797 sec
Testcase: testReplicateLenMismatchedBlock took 10.953 sec
so at the end everything seems fine. I'm still puzzled on why the test didn't fail today and failed yesterday

To answer my own question, I run the test on trunk + HDFS-496 30 times consecutively, for 2 times.
The first time it passed 12 times, and failed 18; the second time it failed 23 times and passed 7 times.
In the second batch I saved the test logs and they all fail for the same error below, which is the same error that makes the test fail with HDFS-396 reverted, so that I'd conclude that this patch does not impact that test. The error seems related to something left open, and I assume it's OS specific.

Testcase: testReplicateLenMismatchedBlock took 1.984 sec
Caused an ERROR
The requested operation cannot be performed on a file with a user-mapped section
open
java.io.IOException: The requested operation cannot be performed on a file with a user-mapped section open
at java.io.RandomAccessFile.setLength(Native Method)
at org.apache.hadoop.hdfs.TestDatanodeBlockScanner.changeReplicaLength(TestDatanodeBlockScanner.java:419)
at org.apache.hadoop.hdfs.TestReplication.changeBlockLen(TestReplication.java:424)
at org.apache.hadoop.hdfs.TestReplication.testReplicateLenMismatchedBlock(TestReplication.java:403)

Luca Telloli
added a comment - 17/Jul/09 16:05 To answer my own question, I run the test on trunk + HDFS-496 30 times consecutively, for 2 times.
The first time it passed 12 times, and failed 18; the second time it failed 23 times and passed 7 times.
In the second batch I saved the test logs and they all fail for the same error below, which is the same error that makes the test fail with HDFS-396 reverted, so that I'd conclude that this patch does not impact that test. The error seems related to something left open, and I assume it's OS specific.
Testcase: testReplicateLenMismatchedBlock took 1.984 sec
Caused an ERROR
The requested operation cannot be performed on a file with a user-mapped section
open
java.io.IOException: The requested operation cannot be performed on a file with a user-mapped section open
at java.io.RandomAccessFile.setLength(Native Method)
at org.apache.hadoop.hdfs.TestDatanodeBlockScanner.changeReplicaLength(TestDatanodeBlockScanner.java:419)
at org.apache.hadoop.hdfs.TestReplication.changeBlockLen(TestReplication.java:424)
at org.apache.hadoop.hdfs.TestReplication.testReplicateLenMismatchedBlock(TestReplication.java:403)

FSImage.java - when trying to process name as file, on exception, should the error say "Error while processing element as URI or File: " + name

Interpreting a path as URI, and on failure as file is repeated in multiple places. Move this into a util, instead of duplicating the code. Please add unit tests to test this method on various platforms to ensure different configuration (that is files, windows files, URIs) are handled. I would like to review the unit tests to see if we are catching all the conditions. These tests need to pass both on Windows and Linux.

In some of the tests, replace(' ', '+'); has been removed. Not sure if this is intentional?

Suresh Srinivas
added a comment - 21/Aug/09 17:48 Comments:
FSImage.getDirectories() in this method, the path for storage directories is interpreted only as File. Could this also have URI?
FSImage.java - when creating new URI please catch specific exception URISyntaxExcepion instead of blanket Exception
FSImage.java - has System.out.println - please remove it
FSImage.java - when trying to process name as file, on exception, should the error say "Error while processing element as URI or File: " + name
Interpreting a path as URI, and on failure as file is repeated in multiple places. Move this into a util, instead of duplicating the code. Please add unit tests to test this method on various platforms to ensure different configuration (that is files, windows files, URIs) are handled. I would like to review the unit tests to see if we are catching all the conditions. These tests need to pass both on Windows and Linux.
In some of the tests, replace(' ', '+'); has been removed. Not sure if this is intentional?
Please use two spaces instead of tab

Let me know if you think that the logic is solid and the unit test is good enough, thanks!

FSImage.getDirectories() in this method, the path for storage directories is interpreted only as File. Could this also have URI?

I'm not 100% sure on this. in FSImage.getDirectories() the assumption is that the list of directories should be already correctly set from the initialization methods no? What I mean is that that method is not reading any configuration property

FSImage.java - when trying to process name as file, on exception, should the error say "Error while processing element as URI or File: " + name

Updated

Interpreting a path as URI, and on failure as file is repeated in multiple places. Move this into a util, instead of duplicating the code.

Done.

Please add unit tests to test this method on various platforms to ensure different configuration (that is files, windows files, URIs) are handled. I would like to review the unit tests to see if we are catching all the conditions. These tests need to pass both on Windows and Linux

I added a unit test as requested.

In some of the tests, replace(' ', '+'); has been removed. Not sure if this is intentional?

Was intentional two patches ago. I removed it though. I think some tests on windows behave weirdly because of directories with spaces (that is, if you run them in c:\Documents and Settings the outcome is different than running them in /cygdrive/c/tmp/) but this should be the subject of a different patch in case.

Luca Telloli
added a comment - 26/Aug/09 15:12 Suresh, thanks for the detailed comments, my replies are below.
In general, after some tests, I came up with an update in the URI creation logic:
if the uri is correctly specified, the URI constructor will take care of it
if the user specified a directory/file and not a uri, then the file constructor will be invoked, and the URi constructed from there
if the scheme is file, then the authoritative part should not be a Windows like drive letter, like C:
The last one comes after reading this page: http://blogs.msdn.com/ie/archive/2006/12/06/file-uris-in-windows.aspx
Let me know if you think that the logic is solid and the unit test is good enough, thanks!
FSImage.getDirectories() in this method, the path for storage directories is interpreted only as File. Could this also have URI?
I'm not 100% sure on this. in FSImage.getDirectories() the assumption is that the list of directories should be already correctly set from the initialization methods no? What I mean is that that method is not reading any configuration property
FSImage.java - when creating new URI please catch specific exception URISyntaxExcepion instead of blanket Exception
Done
FSImage.java - has System.out.println - please remove it
Done: sorry about it
FSImage.java - when trying to process name as file, on exception, should the error say "Error while processing element as URI or File: " + name
Updated
Interpreting a path as URI, and on failure as file is repeated in multiple places. Move this into a util, instead of duplicating the code.
Done.
Please add unit tests to test this method on various platforms to ensure different configuration (that is files, windows files, URIs) are handled. I would like to review the unit tests to see if we are catching all the conditions. These tests need to pass both on Windows and Linux
I added a unit test as requested.
In some of the tests, replace(' ', '+'); has been removed. Not sure if this is intentional?
Was intentional two patches ago. I removed it though. I think some tests on windows behave weirdly because of directories with spaces (that is, if you run them in c:\Documents and Settings the outcome is different than running them in /cygdrive/c/tmp/) but this should be the subject of a different patch in case.

Tsz Wo Nicholas Sze
added a comment - 02/Sep/09 18:52 > No more audit warnings I swear!
Well done! You got the first successful build in the new hudson machine.
Really hope that this can be committed soon. Otherwise, we cannot run hdfs on Windows.

Jonathan Gray
added a comment - 23/Oct/09 21:34 Over in HBase, we have not been able to run our unit tests on Windows in 0.21 because of URI-related errors in MiniDFSCluster.
With this patch, I am now able to run our unit tests on Windows. I used to see messages like this:
2009-10-23 13:18:44,224 ERROR [main] namenode.FSNamesystem(354): Error while processing URI: C:\eclipse\HBase 0.21 Trunk SVN\build\hbase\test\dfs\name1. The error message was: Illegal character in opaque part at index 2: C:\eclipse\HBase 0.21 Trunk SVN\build\hbase\test\dfs\name1
With the patch, I now see:
2009-10-23 13:28:55,823 WARN [main] common.Util(50): Path C:\eclipse\HBase 0.21 Trunk SVN\build\hbase\test\dfs\name1 should be specified as a URI in configuration files. Reverting to the default file scheme
And tests now run as expected. +1 from me and others over in HBase.
Attached is a patch that applies cleanly to the current 0.21 branch, no other changes. Thanks Luca!

Konstantin Shvachko
added a comment - 09/Dec/09 03:00 This patch is based on the previous patches submitted by Luca, and updated by Jonathan.
I removed unused testing for incorrect URI containing windows drive letters.
Improved some message reporting and JavaDoc comments.
Replaced storage directory declaration in hdfs-default.xml with URIs for "dfs.namenode.name.dir" and "dfs.namenode.checkpoint.dir".
I fixed MiniDFSCluster so that it sets URIs as storage directories not files.
Fixed other tests, which set or use storage directories besides the mini-cluster.

I ran tests on Windows and on Linux.
Linux run was good. Windows tests fail because of HDFS-815 and HDFS-816. I think these are different issues unrelated to this patch.
In order to reproduce them though one will need to apply this patch.

Konstantin Shvachko
added a comment - 09/Dec/09 03:04 I ran tests on Windows and on Linux.
Linux run was good. Windows tests fail because of HDFS-815 and HDFS-816 . I think these are different issues unrelated to this patch.
In order to reproduce them though one will need to apply this patch.

Suresh Srinivas
added a comment - 10/Dec/09 07:03 Comments:
Some of the lines in Tests exceed 80 chars
TestHDFSServerPorts.java - when creating URI string why is File.getCanonicalFile().toURI() not used? This is also applicable to other tests.
TestGetUriFromString.testAbsolutePathAsURI - is there a reason why OS check is needed? The test works on unix for both windows and unix paths.

This addresses Suresh's comments.
I did some more refactoring of the patch: I removed unused constants. Several contsant names were not up to our code style (not all capital), and I factored out two commonly used Util methods. I guess other people may forget to use canonical file name the I did, so I used Util.fileAsURI() method.

Konstantin Shvachko
added a comment - 10/Dec/09 23:36 This addresses Suresh's comments.
I did some more refactoring of the patch: I removed unused constants. Several contsant names were not up to our code style (not all capital), and I factored out two commonly used Util methods. I guess other people may forget to use canonical file name the I did, so I used Util.fileAsURI() method.