The first two are because the constructors for SequenceFile.BlockCompressWriter and SequenceFile.RecordCompressWriter have changed between Hadoop 1 and 2. I'll file a Hadoop JIRA for this.

The second two are because of the change in TaskAttemptContext. This can be solved via reflection and separate Maven artifacts for the mapred JAR. The same problem was fixed in MRUnit, see MRUNIT-31 and MRUNIT-56 for some background.

Tom White
added a comment - 19/Sep/12 15:47 Here are the compilation failures:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) on project avro-mapred: Compilation failure: Compilation failure:
[ERROR] /Users/tom/workspace/avro-trunk/lang/java/mapred/src/main/java/org/apache/hadoop/io/SequenceFileBase.java:[44,6] cannot find symbol
[ERROR] symbol : constructor BlockCompressWriter(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.lang.Class,java.lang.Class,int,short,long,org.apache.hadoop.io.compress.CompressionCodec,org.apache.hadoop.util.Progressable,org.apache.hadoop.io.SequenceFile.Metadata)
[ERROR] location: class org.apache.hadoop.io.SequenceFile.BlockCompressWriter
[ERROR] /Users/tom/workspace/avro-trunk/lang/java/mapred/src/main/java/org/apache/hadoop/io/SequenceFileBase.java:[58,6] cannot find symbol
[ERROR] symbol : constructor RecordCompressWriter(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,java.lang.Class,java.lang.Class,int,short,long,org.apache.hadoop.io.compress.CompressionCodec,org.apache.hadoop.util.Progressable,org.apache.hadoop.io.SequenceFile.Metadata)
[ERROR] location: class org.apache.hadoop.io.SequenceFile.RecordCompressWriter
[ERROR] /Users/tom/workspace/avro-trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/AvroMultipleOutputs.java:[425,37] org.apache.hadoop.mapreduce.TaskAttemptContext is abstract; cannot be instantiated
[ERROR] /Users/tom/workspace/avro-trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/AvroMultipleOutputs.java:[498,18] org.apache.hadoop.mapreduce.TaskAttemptContext is abstract; cannot be instantiated
The first two are because the constructors for SequenceFile.BlockCompressWriter and SequenceFile.RecordCompressWriter have changed between Hadoop 1 and 2. I'll file a Hadoop JIRA for this.
The second two are because of the change in TaskAttemptContext. This can be solved via reflection and separate Maven artifacts for the mapred JAR. The same problem was fixed in MRUnit, see MRUNIT-31 and MRUNIT-56 for some background.

Tom White
added a comment - 19/Sep/12 16:05 I filed HADOOP-8825 for the SequenceFile changes.
Here's a patch that uses patched Hadoop JARs with the HADOOP-8825 fix. Avro now compiles against Hadoop 2, however there is more work to produce separate Maven artifacts (probably via profiles).

For the createTaskAttemptContext stuff we should add a comment explaining the purpose of this, to work around incompatible API changes.

For the pom.xml changes, note that the parent pom.xml sets the default version for Hadoop in Avro to 0.20.205.0, so the change here is only for the mapred module, not for the other modules that depend on hadoop (tools and trevni). Perhaps that's intended, so that we end up testing against both versions of Hadoop. If so, we should add a comment to that effect. If its not intended then the version should probably be set in the parent pom.xml. That said, we should not add a dependency on a SNAPSHOT pom, so we'll probably not commit this until there's a Hadoop release that contains the required changes or we figure out another way to fix this.

Doug Cutting
added a comment - 19/Sep/12 17:10 For the createTaskAttemptContext stuff we should add a comment explaining the purpose of this, to work around incompatible API changes.
For the pom.xml changes, note that the parent pom.xml sets the default version for Hadoop in Avro to 0.20.205.0, so the change here is only for the mapred module, not for the other modules that depend on hadoop (tools and trevni). Perhaps that's intended, so that we end up testing against both versions of Hadoop. If so, we should add a comment to that effect. If its not intended then the version should probably be set in the parent pom.xml. That said, we should not add a dependency on a SNAPSHOT pom, so we'll probably not commit this until there's a Hadoop release that contains the required changes or we figure out another way to fix this.

Tom White
added a comment - 19/Sep/12 17:38 Here's an updated patch that does away with SequenceFileBase and requires no changes to SequenceFile in Hadoop 2.
I've also added a comment to createTaskAttemptContext.
For the pom.xml version numbers - I'll fix that to use a single Hadoop version when I do the work to build dual mapred artifacts.

Doug Cutting
added a comment - 19/Sep/12 17:49 Nice work, Tom!
To be clear, the current patch could be committed without the changes to the pom and everything would still work with Hadoop 1, right? Would Hadoop 2 require recompilation?

Tom White
added a comment - 19/Sep/12 17:55 To be clear, the current patch could be committed without the changes to the pom and everything would still work with Hadoop 1, right?
Yes. I haven't tested that yet, but that's the idea.
Would Hadoop 2 require recompilation?
Unfortunately it would. I found this out in MRUNIT-56 - the bytecode for method invocation is different for classes and interfaces, so a separate JAR is needed for each version.

For testing you can run mvn test -Dhadoop.version=2 to test the mapred module with Hadoop 2. If you don't specify the hadoop.version property it defaults to 1 like the current behaviour. The other modules (i.e. not mapred) all build against Hadoop 1 since they use APIs that are binary compatible.

For building, the idea is to create a mapred jar with a hadoop1 or hadoop2 classifier. We also create a Hadoop 1 artifact with no classifier which is the default (for backwards compatibility). For this to work we build against Hadoop 2 first, then Hadoop 1 so that the JAR with no classifier is the last one build (Hadoop 1). I've changed the top-level build script to implement this.

Tom White
added a comment - 20/Sep/12 15:58 Here's a new patch with the Maven changes.
For testing you can run mvn test -Dhadoop.version=2 to test the mapred module with Hadoop 2. If you don't specify the hadoop.version property it defaults to 1 like the current behaviour. The other modules (i.e. not mapred) all build against Hadoop 1 since they use APIs that are binary compatible.
For building, the idea is to create a mapred jar with a hadoop1 or hadoop2 classifier. We also create a Hadoop 1 artifact with no classifier which is the default (for backwards compatibility). For this to work we build against Hadoop 2 first, then Hadoop 1 so that the JAR with no classifier is the last one build (Hadoop 1). I've changed the top-level build script to implement this.
For deployment, the instructions at https://cwiki.apache.org/confluence/display/AVRO/How+To+Release would need to change to deploy the JARs with classifiers. Locally deploying twice worked, although I'm not sure if this would work with a repository manager like Nexus:
mvn deploy -DskipTests=true -Dhadoop.version=2 -DaltDeploymentRepository=mine::default::file:///tmp/myrepo
mvn deploy -DskipTests=true -DaltDeploymentRepository=mine::default::file:///tmp/myrepo
For consumers of the Maven artifacts, if you didn't specify a classifier in your dependency section then it would use Hadoop 1, as before:
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
<version>1.7.2</version>
</dependency>
To use Hadoop 2, you would specify a hadoop2 classifier:
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
<version>1.7.2</version>
<classifier>hadoop2</classifier>
</dependency>

When I apply this patch and run, 'mvn test -Dtest=TestAvroMultipleOutputs' it fails with "java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.task.TaskAttemptContext". When I add '-Dhadoop.version=2' then tests pass. When I change this to '-Dhadoop.version=1", compilation fails.

Doug Cutting
added a comment - 21/Sep/12 19:17 When I apply this patch and run, 'mvn test -Dtest=TestAvroMultipleOutputs' it fails with "java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.task.TaskAttemptContext". When I add '-Dhadoop.version=2' then tests pass. When I change this to '-Dhadoop.version=1", compilation fails.

There was a mistake in the package name which I've now fixed (org.apache.hadoop.mapreduce.TaskAttemptContext). I've also added a profile that matches when you specify '-Dhadoop.version=1', since in the previous patch this caused neither the hadoop1 nor the hadoop2 profile to match, which caused a compilation error.

You need to do a 'mvn clean' before running the tests for a different Hadoop version, otherwise the classes are not recompiled against the new Hadoop classes and you get an IncompatibleClassChangeError. Thus:

Tom White
added a comment - 24/Sep/12 15:04 There was a mistake in the package name which I've now fixed (org.apache.hadoop.mapreduce.TaskAttemptContext). I've also added a profile that matches when you specify '-Dhadoop.version=1', since in the previous patch this caused neither the hadoop1 nor the hadoop2 profile to match, which caused a compilation error.
You need to do a 'mvn clean' before running the tests for a different Hadoop version, otherwise the classes are not recompiled against the new Hadoop classes and you get an IncompatibleClassChangeError. Thus:
mvn clean test -Dtest=TestAvroMultipleOutputs -Dhadoop.version=1
mvn clean test -Dtest=TestAvroMultipleOutputs -Dhadoop.version=2

Tests now pass for me with hadoop.version unspecified or with it set to 1 or 2.

It's unfortunate that we need three profiles. I now see that your previous patch was just intended to work with either version unspecified or 2. That might be preferable, as it removes redundancies in the pom that might later result in problems. Regardless, we should probably add a comment describing the possible values.

We should update the top-level build.sh so that it builds jars with both classifiers. Also the release instructions will need updating so that we publish both types to Maven.

Doug Cutting
added a comment - 24/Sep/12 20:01 Tests now pass for me with hadoop.version unspecified or with it set to 1 or 2.
It's unfortunate that we need three profiles. I now see that your previous patch was just intended to work with either version unspecified or 2. That might be preferable, as it removes redundancies in the pom that might later result in problems. Regardless, we should probably add a comment describing the possible values.
We should update the top-level build.sh so that it builds jars with both classifiers. Also the release instructions will need updating so that we publish both types to Maven.
Where should we document these classifiers for client applications?
Thanks for all your work on this, Tom!

Here's a new patch with just two profiles. I've added a comment to the top-level Java POM explaining the usage.

We should update the top-level build.sh so that it builds jars with both classifiers. Also the release instructions will need updating so that we publish both types to Maven.

I updated the top-level build.sh to build both JARs. I'm not sure exactly how the release instructions change (see earlier comment), but it should be possible to figure that out at release time. We might need to put a note in them as a reminder.

Tom White
added a comment - 25/Sep/12 11:50 Here's a new patch with just two profiles. I've added a comment to the top-level Java POM explaining the usage.
We should update the top-level build.sh so that it builds jars with both classifiers. Also the release instructions will need updating so that we publish both types to Maven.
I updated the top-level build.sh to build both JARs. I'm not sure exactly how the release instructions change (see earlier comment), but it should be possible to figure that out at release time. We might need to put a note in them as a reminder.
Where should we document these classifiers for client applications?
How about adding a question to the FAQ? It would look a bit like last part of https://issues.apache.org/jira/browse/AVRO-1170?focusedCommentId=13459646&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13459646 . I can add that once a release is made.

Tom White
added a comment - 27/Sep/12 14:28 I just committed this. Thanks for the reviews, Doug.
(BTW the last patch didn't include the removal of lang/java/mapred/src/main/java/org/apache/hadoop/io/SequenceFileBase.java, so I did that manually when I committed.)

Using either JAR with cdh3u3 (Hadoop 0.20.2) I get this error:
java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)

So I think both JARS are not compatible with anything before 0.23.0-mr1-cdh4b1 which appears to be when TaskAttemptContext became an interface.

Josh Spiegel
added a comment - 11/Jan/13 03:39 I downloaded both jars here:
wget http://apache.claz.org/avro/avro-1.7.3/java/avro-mapred-1.7.3-hadoop1.jar
wget http://apache.claz.org/avro/avro-1.7.3/java/avro-mapred-1.7.3-hadoop2.jar
Using either JAR with cdh3u3 (Hadoop 0.20.2) I get this error:
java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
So I think both JARS are not compatible with anything before 0.23.0-mr1-cdh4b1 which appears to be when TaskAttemptContext became an interface.

Tom White
added a comment - 11/Jan/13 16:49 I decompiled AvroKeyValueOutputFormat from both JARs with javap and indeed they both have the interface form of TaskAttemptContext:
% javap -c -classpath . org/apache/avro/mapreduce/AvroKeyValueOutputFormat
...
16: invokeinterface #5, 1; //InterfaceMethod org/apache/hadoop/mapreduce/TaskAttemptContext.getOutputKeyClass:()Ljava/lang/Class;
...
For the Hadoop 1 JAR it should be invokevirtual . I opened AVRO-1230 to fix this. Thanks for the report Josh.