Olga Natkovich
added a comment - 15/Jun/11 20:26 We don't have any current plans to use 22 at Yahoo or for Yahoo engineers to do the port. But of course a patch is welcome from anybody interested in taking this project on.

PIG-2125-2.patch tested against MR-279 branch. All Pig end-to-end test pass. Unit test still fail. Zebra test still have a few failures.

The patch borrow some ideas from PIG-924. However, a dynamic shims layer is not possible due to some static dependencies. Here we use a static shims layer, so for different hadoop version, we need to recompile.

Since hadoop.Next maven artifacts is not available, so we need to simulate it by copying all libraries to lib.

Daniel Dai
added a comment - 29/Jun/11 00:33 PIG-2125 -2.patch tested against MR-279 branch. All Pig end-to-end test pass. Unit test still fail. Zebra test still have a few failures.
The patch borrow some ideas from PIG-924 . However, a dynamic shims layer is not possible due to some static dependencies. Here we use a static shims layer, so for different hadoop version, we need to recompile.
Since hadoop.Next maven artifacts is not available, so we need to simulate it by copying all libraries to lib.

How does one build a 0.23 compatible version of Pig? Do you have to download the 0.23 jar yourself? It looked like if you did this, build.xml would manage the rest properly. The answer to this question will need to be added to the release notes field of this JIRA.

Javadocs explaining what the classes do would be very helpful. PigMapReduce, PigMapBase, HadoopShims all could use class level comments. HadoopShims in particular should have extensive comments on how the shims layer works and what needs to be shimmed.

I wonder if HadoopShims should go in a separate package. I know it's just one class, but it performs a unique function. This would also allow many of the design and build comments I referenced in the last point to be covered in a package level doc. I don't feel strongly about this, so feel free to keep it where it is if you think that's better.

Alan Gates
added a comment - 12/Jul/11 19:17 In general the patch looks good. I have a few questions/comments.
How does one build a 0.23 compatible version of Pig? Do you have to download the 0.23 jar yourself? It looked like if you did this, build.xml would manage the rest properly. The answer to this question will need to be added to the release notes field of this JIRA.
Javadocs explaining what the classes do would be very helpful. PigMapReduce, PigMapBase, HadoopShims all could use class level comments. HadoopShims in particular should have extensive comments on how the shims layer works and what needs to be shimmed.
I wonder if HadoopShims should go in a separate package. I know it's just one class, but it performs a unique function. This would also allow many of the design and build comments I referenced in the last point to be covered in a package level doc. I don't feel strongly about this, so feel free to keep it where it is if you think that's better.

1. How does one build a 0.23 compatible version of Pig? Do you have to download the 0.23 jar yourself? It looked like if you did this, build.xml would manage the rest properly. The answer to this question will need to be added to the release notes field of this JIRA.

They need to copy all hadoop 23 jars and dependencies into lib. This step should go away once hadoop 23 publish maven artifacts

2. Javadocs explaining what the classes do would be very helpful. PigMapReduce, PigMapBase, HadoopShims all could use class level comments. HadoopShims in particular should have extensive comments on how the shims layer works and what needs to be shimmed.

Sure. I will add

3. I wonder if HadoopShims should go in a separate package. I know it's just one class, but it performs a unique function. This would also allow many of the design and build comments I referenced in the last point to be covered in a package level doc. I don't feel strongly about this, so feel free to keep it where it is if you think that's better.

Daniel Dai
added a comment - 13/Jul/11 01:58 1. How does one build a 0.23 compatible version of Pig? Do you have to download the 0.23 jar yourself? It looked like if you did this, build.xml would manage the rest properly. The answer to this question will need to be added to the release notes field of this JIRA.
They need to copy all hadoop 23 jars and dependencies into lib. This step should go away once hadoop 23 publish maven artifacts
2. Javadocs explaining what the classes do would be very helpful. PigMapReduce, PigMapBase, HadoopShims all could use class level comments. HadoopShims in particular should have extensive comments on how the shims layer works and what needs to be shimmed.
Sure. I will add
3. I wonder if HadoopShims should go in a separate package. I know it's just one class, but it performs a unique function. This would also allow many of the design and build comments I referenced in the last point to be covered in a package level doc. I don't feel strongly about this, so feel free to keep it where it is if you think that's better.
How about org.apache.pig.backend.hadoop.shims?

Daniel Dai
added a comment - 19/Jul/11 02:15 PIG-2125 -4.patch committed to both trunk and 0.9 branch. This is the first phase work (mapreduce and local mode end-to-end), I will continue work on the next phase (unit test).

PIG-2125-6.patch fixed many of core unit test failures for hadoop 23. There are still 20 failures left. I am working with Hadoop 23 team on these issues:
I am running Pig unit tests with hadoop 23. Here is some issues I current find:

1. Stability issue in MiniMRCluster. Some are intermittent, some fails consistently. Here is the stack I saw:
java.io.FileNotFoundException:file:/home/daniel/pig/target/PigMiniCluster/PigMiniCluster-localDir/usercache/daniel/appcache/application_1313021517794_0001/container_1313021517794_0002_000001.tokens
at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:187)
at org.apache.hadoop.fs.DelegateToFileSystem.open(DelegateToFileSystem.java:150)
at org.apache.hadoop.fs.AbstractFileSystem.open(AbstractFileSystem.java:595)
at org.apache.hadoop.fs.FilterFs.open(FilterFs.java:188)
at org.apache.hadoop.fs.FileContext$6.next(FileContext.java:736)
at org.apache.hadoop.fs.FileContext$6.next(FileContext.java:733)
at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2278)
at org.apache.hadoop.fs.FileContext.open(FileContext.java:733)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:137)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:85)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:779)

Daniel Dai
added a comment - 17/Aug/11 23:33 PIG-2125 -6.patch fixed many of core unit test failures for hadoop 23. There are still 20 failures left. I am working with Hadoop 23 team on these issues:
I am running Pig unit tests with hadoop 23. Here is some issues I current find:
1. Stability issue in MiniMRCluster. Some are intermittent, some fails consistently. Here is the stack I saw:
java.io.FileNotFoundException:
file:/home/daniel/pig/target/PigMiniCluster/PigMiniCluster-localDir/usercache/daniel/appcache/application_1313021517794_0001/container_1313021517794_0002_000001.tokens
at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:187)
at org.apache.hadoop.fs.DelegateToFileSystem.open(DelegateToFileSystem.java:150)
at org.apache.hadoop.fs.AbstractFileSystem.open(AbstractFileSystem.java:595)
at org.apache.hadoop.fs.FilterFs.open(FilterFs.java:188)
at org.apache.hadoop.fs.FileContext$6.next(FileContext.java:736)
at org.apache.hadoop.fs.FileContext$6.next(FileContext.java:733)
at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2278)
at org.apache.hadoop.fs.FileContext.open(FileContext.java:733)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:137)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:85)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:779)
If success, I didn't see it in the log
2. Couter issue. Seems most counter tests fail.
JobClient.getJob(job.getAssignedJobID()) always give me null.
3. MiniMRYarnCluster.getConfig() give me this entry:
"mapreduce.job.hdfs-servers=$
{fs.default.name}
", which should have done the variable substitution.
4. Failure to start MiniHBaseCluster, all HBase tests fail.

Thomas Weise
added a comment - 05/Oct/11 17:54 This patch is to support hadoopversion=23 to resolve dependencies through maven.
Note that currently mapreduce does not have the jar files in maven, so you will still need to copy those into lib/

PIG-2125-8.patch makes Pig compile and run against both 23 and 20 using maven artifacts. The only exception is hadoop-mapreduce.jar, which hadoop still fail to publish, and we need to copy it into lib. PIG-2125-8.patch include all changes in PIG-2125-7.patch, PIG-2125-buildxml-0.9.patch and PIG-2125-ivy-0.9-3.patch.

Daniel Dai
added a comment - 01/Nov/11 02:12 PIG-2125 -8.patch makes Pig compile and run against both 23 and 20 using maven artifacts. The only exception is hadoop-mapreduce.jar, which hadoop still fail to publish, and we need to copy it into lib. PIG-2125 -8.patch include all changes in PIG-2125 -7.patch, PIG-2125 -buildxml-0.9.patch and PIG-2125 -ivy-0.9-3.patch.

PIG-2125-10.patch, PIG-2125-10_0.9.patch fix test-patch warnings:[exec] -1 overall. [exec][exec] +1 @author. The patch does not contain any @author tags.[exec][exec] +1 tests included. The patch appears to include 46 new or modified tests.[exec][exec] +1 javadoc. The javadoc tool did not generate any warning messages.[exec][exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.[exec][exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.[exec][exec] -1 release audit. The applied patch generated 462 release audit warnings (more than the trunk's current 447 warnings).

After PIG-2125-10.patch, we can compile and run end-to-end tests on 23. There are still ~20 unit tests fail on 23 and we are working on that.

Since all patches attached to the Jira is committed, and this thread is becoming big and confusing, I would suggest to close this ticket and create a separate ticket to track unit test fixes for hadoop 23, agree?

Daniel Dai
added a comment - 02/Nov/11 20:22 After PIG-2125 -10.patch, we can compile and run end-to-end tests on 23. There are still ~20 unit tests fail on 23 and we are working on that.
Since all patches attached to the Jira is committed, and this thread is becoming big and confusing, I would suggest to close this ticket and create a separate ticket to track unit test fixes for hadoop 23, agree?