It would have been ideal if we could directly use Avro generated classes without wrapping them. Wrapper classes are not very maintainable because we need to modify at two places- schema definition and wrapper class. Any reason why we can't directly use generated classes ? From what I can think of - this is done because we want all event classes to have a base interface, constructor and field getters. Having constructor and getters should be straight forward in code generator. For base class/interface, I think Avro can generate code from a template. SpecificRecordBase methods can directly go into the generated class (to work around multiple inheritance in java). Users can define the base class, interfaces or additional methods in the template which can be used to generate Avro specific class. I understand that this may not be doable at this point but something worth considering at some point to make Avro code generation feature more compelling.

Sharad Agarwal
added a comment - 17/Sep/09 07:32 It would have been ideal if we could directly use Avro generated classes without wrapping them. Wrapper classes are not very maintainable because we need to modify at two places- schema definition and wrapper class. Any reason why we can't directly use generated classes ? From what I can think of - this is done because we want all event classes to have a base interface, constructor and field getters. Having constructor and getters should be straight forward in code generator. For base class/interface, I think Avro can generate code from a template. SpecificRecordBase methods can directly go into the generated class (to work around multiple inheritance in java). Users can define the base class, interfaces or additional methods in the template which can be used to generate Avro specific class. I understand that this may not be doable at this point but something worth considering at some point to make Avro code generation feature more compelling.

You've already cited the biggest reason: the generated classes don't provide constructors or accessors. Long-term, we could enhance Avro to generate these, but I'm not sure we'd want to directly use the generated classes even then.

The wrappers provide considerable utility, including:

Javadoc comments. We could generate these perhaps from documentation in the schema.

Visibility: The wrappers only provide public getters, not setters. We could perhaps add that to the schema and/or generator.

Type conversion: In both the version included in MAPREDUCE-157 and this version there's a fair amount of field-specific type conversion. For example, we don't directly serialize JobID instances, but rather use JobID's toString() and forName() methods to convert these to and from strings for serialization. Similarly for counters, task ids, etc. Ideally all of these would be naturally serializeable using Avro, but, until they are, the wrappers make it easy to incorporate things like these.

Compatibility: If we update the schema then Avro will handle reading old data, but, without the wrappers, we'd be unable to provide a back-compatible API for accessing the old data. So if we remove a field from the schema, with the wrappers we're able to deprecate the accessor and implement it in terms of new/remaining fields so that applications don't have to be upgraded.

So I'm not entirely convinced that using wrappers for stuff like this is a bad pattern long term.

Doug Cutting
added a comment - 17/Sep/09 19:59 > Any reason why we can't directly use generated classes ?
You've already cited the biggest reason: the generated classes don't provide constructors or accessors. Long-term, we could enhance Avro to generate these, but I'm not sure we'd want to directly use the generated classes even then.
The wrappers provide considerable utility, including:
Javadoc comments. We could generate these perhaps from documentation in the schema.
Visibility: The wrappers only provide public getters, not setters. We could perhaps add that to the schema and/or generator.
Type conversion: In both the version included in MAPREDUCE-157 and this version there's a fair amount of field-specific type conversion. For example, we don't directly serialize JobID instances, but rather use JobID's toString() and forName() methods to convert these to and from strings for serialization. Similarly for counters, task ids, etc. Ideally all of these would be naturally serializeable using Avro, but, until they are, the wrappers make it easy to incorporate things like these.
Compatibility: If we update the schema then Avro will handle reading old data, but, without the wrappers, we'd be unable to provide a back-compatible API for accessing the old data. So if we remove a field from the schema, with the wrappers we're able to deprecate the accessor and implement it in terms of new/remaining fields so that applications don't have to be upgraded.
So I'm not entirely convinced that using wrappers for stuff like this is a bad pattern long term.

My experience with generated objects (from a couple of years using protocol buffers) is that one ends up wrapping them often (preferably with composition).

The generated class is responsible for serialization and deserialization, and the wrapper class is responsible for added logic. It's hard to make the generator do something reasonable for logic (or even inheritance) cross-language. Having a wrapper also allows you to have two ways to use something, in two different contexts, where you might want different surrounding logic. (So, if you had an Avro schema for an Event, the code that generates the Event might use one wrapper, and the code that consumes it might use the raw object, or have a different object.)

Philip Zeyliger
added a comment - 17/Sep/09 21:17 My experience with generated objects (from a couple of years using protocol buffers) is that one ends up wrapping them often (preferably with composition).
The generated class is responsible for serialization and deserialization, and the wrapper class is responsible for added logic. It's hard to make the generator do something reasonable for logic (or even inheritance) cross-language. Having a wrapper also allows you to have two ways to use something, in two different contexts, where you might want different surrounding logic. (So, if you had an Avro schema for an Event, the code that generates the Event might use one wrapper, and the code that consumes it might use the raw object, or have a different object.)

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

Hadoop QA
added a comment - 18/Sep/09 07:17 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12419913/MAPREDUCE-980.patch
against trunk revision 816454.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/101/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/101/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/101/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/101/console
This message is automatically generated.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The patch appears to cause tar ant target to fail.

-1 findbugs. The patch appears to cause Findbugs to fail.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

Fixed all ivy.xml files to now refer to Avro 1.0 rather than 1.1. Avro, Jackson and Paranamer versions are now specified in library.properties, so that this should not occur again.

'ant test-patch' reports:

[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no new tests are needed for this patch.
[exec] Also please list what manual steps were performed to verify this patch.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

No new tests are required, as MAPREDUCE-157 supplied sufficient tests.

Doug Cutting
added a comment - 18/Sep/09 17:47 Fixed all ivy.xml files to now refer to Avro 1.0 rather than 1.1. Avro, Jackson and Paranamer versions are now specified in library.properties, so that this should not occur again.
'ant test-patch' reports:
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no new tests are needed for this patch.
[exec] Also please list what manual steps were performed to verify this patch.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
No new tests are required, as MAPREDUCE-157 supplied sufficient tests.

Jothi Padmanabhan
added a comment - 18/Sep/09 17:52 MAPREDUCE-277 will conflict with this patch – I added two more methods to the JobSubmittedEvent in that patch. Depending on which goes first, the other will have to merge.

Doug Cutting
added a comment - 18/Sep/09 18:53 > MAPREDUCE-277 will conflict with this patch
I'm happy to do the merge regardless of which is committed first.
Since MAPREDUCE-277 is a blocker, my preference would be to commit this issue first, then that one, since it is not subject to today's deadline.
I still need a committer to +1 this.

Jothi Padmanabhan
added a comment - 18/Sep/09 18:59 my preference would be to commit this issue first, then that one, since it is not subject to today's deadline.
Sorry, just saw this. In the meanwhile, MAPREDUCE-277 got committed.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

Hadoop QA
added a comment - 18/Sep/09 19:11 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12420034/MAPREDUCE-980.patch
against trunk revision 816664.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/53/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/53/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/53/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/53/console
This message is automatically generated.

[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no new tests are needed for this patch.
[exec] Also please list what manual steps were performed to verify this patch.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

Doug Cutting
added a comment - 18/Sep/09 19:37 'ant test-patch' on current patch:
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no new tests are needed for this patch.
[exec] Also please list what manual steps were performed to verify this patch.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

Jothi Padmanabhan
added a comment - 18/Sep/09 19:58 Just another clarification – since the patch is using avro1.1, should we change the encoder to json instead of binary so that tools that scrape the logs instead of using EventReaders be supported?

Doug Cutting
added a comment - 18/Sep/09 20:09 > should we change the encoder to json [ ... ]
That was my initial instinct too, but Eric and Owen both indicated to me that they preferred that we use binary.
Eric's comment is:
https://issues.apache.org/jira/browse/MAPREDUCE-157?focusedCommentId=12745279&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12745279
Owen has indicated this in offline discussions. The idea is that one can easily use Avro to dump the binary as JSON, but that the binary is smaller and faster.
It's a trivial change to make if we prefer JSON instead of binary.

I do not know if I am being naive here, but should the EventWriter.Version be something different than "Avro-Binary"? Something that will help us keep track of schema evolutions, like "1.0" ? Or is the version used for a different purpose?

Jothi Padmanabhan
added a comment - 18/Sep/09 20:54 I do not know if I am being naive here, but should the EventWriter.Version be something different than "Avro-Binary"? Something that will help us keep track of schema evolutions, like "1.0" ? Or is the version used for a different purpose?

> should the EventWriter.Version be something different than "Avro-Binary"? Something that will help us keep track of schema evolutions

The entire schema is included in the file. If the schema changes, Avro can still read old data. We don't need to update the file version if we, e.g., add a field. If we make such a fundamental change to the schema that Avro's automatic versioning cannot handle it, then we could change the version string to be "Avro-Binary-v2" or something. Or we could examine the schema itself to determine which version it is.

Doug Cutting
added a comment - 18/Sep/09 21:14 > should the EventWriter.Version be something different than "Avro-Binary"? Something that will help us keep track of schema evolutions
The entire schema is included in the file. If the schema changes, Avro can still read old data. We don't need to update the file version if we, e.g., add a field. If we make such a fundamental change to the schema that Avro's automatic versioning cannot handle it, then we could change the version string to be "Avro-Binary-v2" or something. Or we could examine the schema itself to determine which version it is.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

Hadoop QA
added a comment - 18/Sep/09 22:33 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12420061/MAPREDUCE-980.patch
against trunk revision 816735.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/54/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/54/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/54/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/54/console
This message is automatically generated.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

Hadoop QA
added a comment - 19/Sep/09 01:31 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12420063/MAPREDUCE-980.patch
against trunk revision 816782.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/116/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/116/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/116/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/116/console
This message is automatically generated.