Description

Unexpected: The subprocess at level 3 in the subtree is not alive before Job completion

Stacktrace

junit.framework.AssertionFailedError: Unexpected: The subprocess at level 3 in the subtree is not alive before Job completion
at org.apache.hadoop.mapred.TestKillSubProcesses.runJobAndSetProcessHandle(TestKillSubProcesses.java:221)
at org.apache.hadoop.mapred.TestKillSubProcesses.runFailingJobAndValidate(TestKillSubProcesses.java:112)
at org.apache.hadoop.mapred.TestKillSubProcesses.runTests(TestKillSubProcesses.java:327)
at org.apache.hadoop.mapred.TestKillSubProcesses.testJobKillFailAndSucceed(TestKillSubProcesses.java:310)

Sreekanth Ramakrishnan
added a comment - 15/Jun/09 05:46 Is this failure related to the patch or was it found in Trunk builds?
In following trunk builds the test case successfully passed after changes to TestKillSubProcesses were put in.
http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/864/
http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/865/
http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/866/

The issue is reproducible with trunk if we add Thread.sleep(5000) in runJobAndSetProcessHandle() before the assert statements for checking if the child processes are alive. The problem was that fs was not set in Mappers, thus signalFile creation was not checked causing the map task to finish immediately(in case of failing mapper and succeeding mapper.

Ravi Gummadi
added a comment - 16/Jul/09 05:37 The issue is reproducible with trunk if we add Thread.sleep(5000) in runJobAndSetProcessHandle() before the assert statements for checking if the child processes are alive. The problem was that fs was not set in Mappers, thus signalFile creation was not checked causing the map task to finish immediately(in case of failing mapper and succeeding mapper.

Vinod Kumar Vavilapalli
added a comment - 16/Jul/09 07:16 Patch looks good. +1.
Documenting what the patch does.
Fixed FileSystem to be set in the mapper.
Cleans up various signal file/directory related variables to be done at a single place.
Explicitly sets test.build.data for the child using mapred.child.java.opts as test.build.data is not passed to child otherwise and in our test, child needs access to files/dirs in this temporary dir.

[exec] +1 overall.[exec][exec] +1 @author. The patch does not contain any @author tags.[exec][exec] +1 tests included. The patch appears to include 3 new or modified tests.[exec][exec] +1 javadoc. The javadoc tool did not generate any warning messages.[exec][exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.[exec][exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.[exec][exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

Ravi Gummadi
added a comment - 24/Jul/09 11:58 Unit tests passed on local machine.
ant test-patch gave
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 3 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

Vinod Kumar Vavilapalli
added a comment - 20/Nov/09 09:36 +1 for the Y! 20 distribution patch. I could reproduce the bug on Y! distribution without the patch, and I've verified that the patch applies successfully and solves the problem with the test-case.