The infinite loop caused by FileSystem.loadFileSystems(). If the users register org.apache.hadoop.fs.FsUrlStreamHandlerFactory as the current URLStreamHandlerFactory, then calling FsUrlStreamHandlerFactory.createURLStreamHandler() will lead to invoke FileSystem.loadFileSystems(). FileSystem.loadFileSystems() will cause to invoke FsUrlStreamHandlerFactory.createURLStreamHandler() when resolving a URL. Then the infinite loop occurs.

Just moving the function of FileSystem.loadFileSystems() into a static code block. It will initialize as the construction of Class and only once.

Yanbo Liang
added a comment - 15/Nov/12 09:54 The infinite loop caused by FileSystem.loadFileSystems(). If the users register org.apache.hadoop.fs.FsUrlStreamHandlerFactory as the current URLStreamHandlerFactory, then calling FsUrlStreamHandlerFactory.createURLStreamHandler() will lead to invoke FileSystem.loadFileSystems(). FileSystem.loadFileSystems() will cause to invoke FsUrlStreamHandlerFactory.createURLStreamHandler() when resolving a URL. Then the infinite loop occurs.
Just moving the function of FileSystem.loadFileSystems() into a static code block. It will initialize as the construction of Class and only once.
Looking forward for any comments.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

Hadoop QA
added a comment - 15/Nov/12 10:16 -1 overall . Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12553639/HADOOP-9041.patch
against trunk revision .
+1 @author . The patch does not contain any @author tags.
-1 tests included . The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javac . The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc . The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse . The patch built with eclipse:eclipse.
+1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit . The applied patch does not increase the total number of release audit warnings.
-1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common:
org.apache.hadoop.fs.viewfs.TestViewFsURIs
org.apache.hadoop.fs.viewfs.TestChRootedFs
org.apache.hadoop.fs.TestLocalFsFCStatistics
org.apache.hadoop.fs.viewfs.TestViewFsWithAuthorityLocalFs
org.apache.hadoop.fs.TestS3_LocalFileContextURI
org.apache.hadoop.fs.TestFileContextResolveAfs
org.apache.hadoop.fs.TestLocal_S3FileContextURI
org.apache.hadoop.fs.TestLocalFSFileContextSymlink
org.apache.hadoop.fs.viewfs.TestViewFsLocalFs
org.apache.hadoop.fs.TestFileContextDeleteOnExit
+1 contrib tests . The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1749//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1749//console
This message is automatically generated.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

Hadoop QA
added a comment - 16/Nov/12 10:47 -1 overall . Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12553761/HADOOP-9041.patch
against trunk revision .
+1 @author . The patch does not contain any @author tags.
-1 tests included . The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javac . The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc . The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse . The patch built with eclipse:eclipse.
+1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit . The applied patch does not increase the total number of release audit warnings.
+1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common.
+1 contrib tests . The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1760//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1760//console
This message is automatically generated.

Yanbo Liang, the change in the patch looks like it would not change the current behavior. Still I'm failing to see why a call to how FileSystem.loadFileSystems() gets to invoke FsUrlStreamHandlerFactory. AFAIK, this could happen only if the default constructor of the filesystem tries to create a URL with a FS scheme. Which FileSystem impl you've found it is doing that?

Alejandro Abdelnur
added a comment - 17/Nov/12 06:54 Yanbo Liang, the change in the patch looks like it would not change the current behavior. Still I'm failing to see why a call to how FileSystem.loadFileSystems() gets to invoke FsUrlStreamHandlerFactory. AFAIK, this could happen only if the default constructor of the filesystem tries to create a URL with a FS scheme. Which FileSystem impl you've found it is doing that?

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

Hadoop QA
added a comment - 17/Nov/12 16:47 -1 overall . Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12553900/HADOOP-9041.patch
against trunk revision .
+1 @author . The patch does not contain any @author tags.
-1 tests included . The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javac . The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc . The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse . The patch built with eclipse:eclipse.
+1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit . The applied patch does not increase the total number of release audit warnings.
+1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common.
+1 contrib tests . The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1768//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1768//console
This message is automatically generated.

Hi Alejandro,
My previous comment may be not very clear. The detail calling stack is described as follow:
If users register org.apache.hadoop.fs.FsUrlStreamHandlerFactory as the current URLStreamHandlerFactory before calling FileSystem.getFileSystem()->FileSystem.loadFileSystems() will lead infinite loop.
1) org.apache.hadoop.fs.FsUrlStreamHandlerFactory has been registered as the current URLStreamHandlerFactory.
2) users call FileSystem.getFileSystem()->ClassFileSystem.loadFileSystems().
3) Because before 2) users have never called FileSystem.loadFileSystems(), so it will execute the code of fuction FileSystem.loadFileSystems().
4) In FileSystem.loadFileSystems(), it uses ServiceLoader to load providers of FileSystem such as hdfs, kfs, s3 and etc.
5) When execute ServiceLoader, it need to read the providers of FileSystem from resource directory such as jar file on local disk. The ServiceLoader will recognize the jar file as URL.
6) ServiceLoader create URL object and open stream to this URL.
7) The URL need to find handler for a specific protocol such as "file:///" then it will call URL.getURLStreamHandler() and indirectly call FsUrlStreamHandlerFactory.createURLStreamHandler().
8) At the function of FsUrlStreamHandlerFactory.createURLStreamHandler(), it need to recognize different file system schemes or protocols according to the providers of FileSystem (If the jar file is on local disk, it need to know the implementaion of LocalFileSystem). But at this time the providers of FileSystem had not loaded in memory, it will call FileSystem.getFileSystem("file",conf)->FileSystem.loadFileSystems(). We jump to step 2) and drop into infinite loop.

Because the URL is closely relevent with concrete FileSystem implementations, we need to load FileSystem implemetations before any URL related operations. I mean to call FileSystem.getFileSystemClass("file",conf) in the construction of class FsUrlStreamHandlerFactory to solve this problem, because FsUrlStreamHandlerFactory need ensure to know the FileSystem implementation of scheme "file:///" at least and then it can work regularly.

Yanbo Liang
added a comment - 17/Nov/12 17:26 Hi Alejandro,
My previous comment may be not very clear. The detail calling stack is described as follow:
If users register org.apache.hadoop.fs.FsUrlStreamHandlerFactory as the current URLStreamHandlerFactory before calling FileSystem.getFileSystem()->FileSystem.loadFileSystems() will lead infinite loop.
1) org.apache.hadoop.fs.FsUrlStreamHandlerFactory has been registered as the current URLStreamHandlerFactory.
2) users call FileSystem.getFileSystem()->ClassFileSystem.loadFileSystems().
3) Because before 2) users have never called FileSystem.loadFileSystems(), so it will execute the code of fuction FileSystem.loadFileSystems().
4) In FileSystem.loadFileSystems(), it uses ServiceLoader to load providers of FileSystem such as hdfs, kfs, s3 and etc.
5) When execute ServiceLoader, it need to read the providers of FileSystem from resource directory such as jar file on local disk. The ServiceLoader will recognize the jar file as URL.
6) ServiceLoader create URL object and open stream to this URL.
7) The URL need to find handler for a specific protocol such as "file:///" then it will call URL.getURLStreamHandler() and indirectly call FsUrlStreamHandlerFactory.createURLStreamHandler().
8) At the function of FsUrlStreamHandlerFactory.createURLStreamHandler(), it need to recognize different file system schemes or protocols according to the providers of FileSystem (If the jar file is on local disk, it need to know the implementaion of LocalFileSystem). But at this time the providers of FileSystem had not loaded in memory, it will call FileSystem.getFileSystem("file",conf)->FileSystem.loadFileSystems(). We jump to step 2) and drop into infinite loop.
Because the URL is closely relevent with concrete FileSystem implementations, we need to load FileSystem implemetations before any URL related operations. I mean to call FileSystem.getFileSystemClass("file",conf) in the construction of class FsUrlStreamHandlerFactory to solve this problem, because FsUrlStreamHandlerFactory need ensure to know the FileSystem implementation of scheme "file:///" at least and then it can work regularly.
The patch had been attached. Looking forward to your comments.

Alejandro Abdelnur
added a comment - 19/Nov/12 18:40 Yanbo Liang, thanks for the detailed explanation, yes it makes sense. As a side question, do you know why this has not been observed before? Also, as Tom said the testcase should be junit based.

@Alejandro，
There may be many reasons this has not been observed before:
1, There are no test case cover this situation until now. In other words, no test case registering FsUrlStreamHandlerFactory as the current URLStreamHandlerFactory before calling FileSystem.getFileSystem()->FileSystem.loadFileSystems().
2, The case is environment dependent. @Radim had attach the junit testcase, but the trunk still run success over this test case before we fix this bug. I also encounter this situation and run this test case base on trunk code on different environment, but with different result.

simple java application, the test case failure as expected.

maven+junit, the test case success(this is not expected).

eclipse+junit, the test case failure as expected.

I still work on tracing the different result in different environment.
Thanks to @Radim for attaching the junit based test case.

Yanbo Liang
added a comment - 20/Nov/12 16:41 @Alejandro，
There may be many reasons this has not been observed before:
1, There are no test case cover this situation until now. In other words, no test case registering FsUrlStreamHandlerFactory as the current URLStreamHandlerFactory before calling FileSystem.getFileSystem()->FileSystem.loadFileSystems().
2, The case is environment dependent. @Radim had attach the junit testcase, but the trunk still run success over this test case before we fix this bug. I also encounter this situation and run this test case base on trunk code on different environment, but with different result.
simple java application, the test case failure as expected.
maven+junit, the test case success(this is not expected).
eclipse+junit, the test case failure as expected.
I still work on tracing the different result in different environment.
Thanks to @Radim for attaching the junit based test case.

@Radim, in maven+junit test case may not fail is not caused by reduplicative initialization. Because if reduplicative initialization occur, it will throw Error.
As my opinion, it caused by the maven+junit framework has load the implementation of FileSystem before the test case running, so it will call ServiceLoader ahead of registering FsUrlStreamHandlerFactory all the time.
So the test case can not reproduce in maven+junit.

Yanbo Liang
added a comment - 21/Nov/12 15:37 @Radim, in maven+junit test case may not fail is not caused by reduplicative initialization. Because if reduplicative initialization occur, it will throw Error.
As my opinion, it caused by the maven+junit framework has load the implementation of FileSystem before the test case running, so it will call ServiceLoader ahead of registering FsUrlStreamHandlerFactory all the time.
So the test case can not reproduce in maven+junit.

As my tracing the process of running test case in maven+junit environment, I have found that in this environment the ServiceLoader will directly load service providers from the file "target/class/META-INF/services/org.apache.hadoop.fs.FileSystem" rather than load from the jar package. Because much resource would be loaded from this directory "target/class/", so the classloader will reuse this URL and then reuse the URL handler for the scheme of "file". ServiceLoader will call classloader to load resource from the file in this directory, then the loading will not use the the handler we just register. Due to not use the handler we just register in FsUrlStreamHandlerFactory, the reusing handler will work well and not fall into the infinite loop.
So this bug may be difficult to reproduce in junit environment. But in regular development, it will be dangerous and hidden deep. Until here I think my patch can resolve this problem and looking forward any comments.

Yanbo Liang
added a comment - 22/Nov/12 07:36 As my tracing the process of running test case in maven+junit environment, I have found that in this environment the ServiceLoader will directly load service providers from the file "target/class/META-INF/services/org.apache.hadoop.fs.FileSystem" rather than load from the jar package. Because much resource would be loaded from this directory "target/class/", so the classloader will reuse this URL and then reuse the URL handler for the scheme of "file". ServiceLoader will call classloader to load resource from the file in this directory, then the loading will not use the the handler we just register. Due to not use the handler we just register in FsUrlStreamHandlerFactory, the reusing handler will work well and not fall into the infinite loop.
So this bug may be difficult to reproduce in junit environment. But in regular development, it will be dangerous and hidden deep. Until here I think my patch can resolve this problem and looking forward any comments.

So this bug may be difficult to reproduce in junit environment. But in regular development, it will be dangerous and hidden deep. Until here I think my patch can resolve this problem and looking forward any comments.

Yanbo Liang could you post instructions for how to reproduce the infinite loop failure using a simple java application? I've tried to reproduce it using Eclipse+junit (using JDK 1.6.0_24 on Linux) and have not managed to cause the failure. If you could upload or paste a standalone testcase that fails, plus instructions for how to run it (on whatever platform) that will help with manual testing.

Andy Isaacson
added a comment - 11/Dec/12 01:50 So this bug may be difficult to reproduce in junit environment. But in regular development, it will be dangerous and hidden deep. Until here I think my patch can resolve this problem and looking forward any comments.
Yanbo Liang could you post instructions for how to reproduce the infinite loop failure using a simple java application? I've tried to reproduce it using Eclipse+junit (using JDK 1.6.0_24 on Linux) and have not managed to cause the failure. If you could upload or paste a standalone testcase that fails, plus instructions for how to run it (on whatever platform) that will help with manual testing.

Andy Isaacson I just attached TestFileSystem.java, it can reproduce the infinite loop failure using the simple java application. You should add corresponding libraries to classpath and run this main class, then the infinite loop occurred.

Yanbo Liang
added a comment - 11/Dec/12 03:19 Andy Isaacson I just attached TestFileSystem.java, it can reproduce the infinite loop failure using the simple java application. You should add corresponding libraries to classpath and run this main class, then the infinite loop occurred.

Thanks, that helps. I was able to reproduce the infinite recursion using trunk on Linux withjavac -classpath `hadoop classpath` TestFileSystem.javajava -classpath `pwd`:`hadoop classpath` TestFileSystem

and the stack trace

...
at java.net.URL.getURLStreamHandler(URL.java:1107)
at java.net.URL.<init>(URL.java:572)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at java.net.JarURLConnection.parseSpecs(JarURLConnection.java:161)
at java.net.JarURLConnection.<init>(JarURLConnection.java:144)
at sun.net.www.protocol.jar.JarURLConnection.<init>(JarURLConnection.java:63)
at sun.net.www.protocol.jar.Handler.openConnection(Handler.java:24)
at java.net.URL.openConnection(URL.java:945)
at java.net.URL.openStream(URL.java:1010)
at java.util.ServiceLoader.parse(ServiceLoader.java:279)
at java.util.ServiceLoader.access$200(ServiceLoader.java:164)
at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:332)
at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:415)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2232)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2243)
at org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:67)
at java.net.URL.getURLStreamHandler(URL.java:1107)
...

Andy Isaacson
added a comment - 11/Dec/12 20:51 Thanks, that helps. I was able to reproduce the infinite recursion using trunk on Linux with
javac -classpath `hadoop classpath` TestFileSystem.java
java -classpath `pwd`:`hadoop classpath` TestFileSystem
and the stack trace
...
at java.net.URL.getURLStreamHandler(URL.java:1107)
at java.net.URL.<init>(URL.java:572)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at java.net.JarURLConnection.parseSpecs(JarURLConnection.java:161)
at java.net.JarURLConnection.<init>(JarURLConnection.java:144)
at sun.net.www.protocol.jar.JarURLConnection.<init>(JarURLConnection.java:63)
at sun.net.www.protocol.jar.Handler.openConnection(Handler.java:24)
at java.net.URL.openConnection(URL.java:945)
at java.net.URL.openStream(URL.java:1010)
at java.util.ServiceLoader.parse(ServiceLoader.java:279)
at java.util.ServiceLoader.access$200(ServiceLoader.java:164)
at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:332)
at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:415)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2232)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2243)
at org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:67)
at java.net.URL.getURLStreamHandler(URL.java:1107)
...

Please add asserts as appropriate to ensure that we follow the expected code path. For example, if we expect IOException, add assertFalse(true); after calling getFileSystemClass, with comments explaining the expected behavior. See TestLocalDirAllocator.java for an example.

Andy Isaacson
added a comment - 14/Dec/12 01:10
+ FileSystem.getFileSystemClass( "file" ,conf);
missing a space after the ",".
+ public class TestFileSystemInitialization {
+
+ /**
Indentation is wonky, we do not use tabs and we use 2 space indents.
+ * Check if FileSystem can be properly initialized if URLStreamHandlerFactory
+ * is registered.
+ */
Since this is such an obscure failure condition, please mention the JIRA in the comment so that someone who is curious does not have to dig in the VCS history to figure out what's going on.
+ try {
+ FileSystem.getFileSystemClass( "file:" , conf);
+ }
+ catch (IOException ok) {}
+ }
Please add asserts as appropriate to ensure that we follow the expected code path. For example, if we expect IOException , add assertFalse(true); after calling getFileSystemClass , with comments explaining the expected behavior. See TestLocalDirAllocator.java for an example.

If there is an problem in FileSystem.getFileSystemClass("file", conf); then you will catch that exception later in correct code path, where it is called, expected and processed. Here it is just initializer.

You might be interested in different filesystem "hdfs" and thus throwing an exception because "file" failed load increases complexity of initializer exception handling in caller.

Radim Kolar
added a comment - 15/Dec/12 18:51 If there is an problem in FileSystem.getFileSystemClass("file", conf); then you will catch that exception later in correct code path, where it is called, expected and processed. Here it is just initializer.
You might be interested in different filesystem "hdfs" and thus throwing an exception because "file" failed load increases complexity of initializer exception handling in caller.

You missed MY point. I don't agree that "throwing an exception because "file" failed to load increase complexity of initializer exception handling in caller". On the contrary, things might fail mysteriously much later in unrelated code path. i.e. "hdfs" could load just fine because you swallowed the exception. I prefer it to fail early when things go wrong. When you say "you will catch that exception later", you're assuming the exception is deterministic. It could be that the first exception make the system get into a wrong/undefined state and you may never catch that exception later.

I suggest that you simply convert the exception into a run time exception.

Luke Lu
added a comment - 15/Dec/12 19:19 You missed MY point. I don't agree that "throwing an exception because "file" failed to load increase complexity of initializer exception handling in caller". On the contrary, things might fail mysteriously much later in unrelated code path. i.e. "hdfs" could load just fine because you swallowed the exception. I prefer it to fail early when things go wrong. When you say "you will catch that exception later", you're assuming the exception is deterministic. It could be that the first exception make the system get into a wrong/undefined state and you may never catch that exception later.
I suggest that you simply convert the exception into a run time exception.

Luke Lu
added a comment - 15/Dec/12 21:35 The logic of fsinit5 lgtm. OTOH, you should fix your ide/editor settings. Hadoop indention is 2 spaces, not 3, or 4. The patch has all 3 kinds of indentations.