Added the java annotations @AvroName, @AvroIgnore, @AvroMeta, @AvroAlias and @AvroEncode to control the behavior of avro when using reflection.

Description

It would be great if one could control avro with java annotations. As of now, it is already possible to mark fields as Nullable or classes being encoded as a String. I propose a bigger set of annotations to control the behavior of avro on fields and classes. Such annotations have proven useful with jacksons json serialization and morphias mongoDB serialization.

Java fields with the @AvroName("alternativeName") annotation will be renamed in the induced schema. When reading an avro file via reflection, the reflection reader will look for fields in the schema with "alternativeName".
For example:

@AvroName("foo")
int bar;

is serialized as

{ "name" : "foo", "type" : "int" }

The @AvroAlias annotation will add a new alias to the induced schema of a record, enum or field. The space parameter is optional and defaults to the namespace of the named schema the alias is added to.

Fields with the @AvroIgnore annotation will be treated as if they had a transient modifier, i.e. they will not be written to or read from avro files.

The @AvroMeta(key="K", value="V") annotation allows you to store an arbitrary key : value pair at every node in the schema.

@AvroMeta(key="fieldKey", value="fieldValue")
int foo;

will create the following schema

{"name" : "foo", "type" : "int", "fieldKey" : "fieldValue" }

Fields can be custom encoded with the AvroEncode(using=CustomEncoding.class) annotation. This annotation is a generalization of the @Stringable annotation. The @Stringable annotation is limited to classes with string argument constructors. Some classes can be similarly reduced to a smaller class or even a single primitive, but dont fit the requirements for @Stringable. A prominent example is java.util.Date, which instances can essentially be described with a single long. Such classes can now be encoded with a CustomEncoding, which reads and writes directly from the encoder/decoder.

One simply extends the abstract CustomEncodings class by implementing a schema, a read method and a write method. A java field can then be annotated like this:

Java's "transient" modifier is already equivalent to AvroIgnore. We might additionally observe Java's Transient annotation. With those, do we also need AvroIgnore? Perhaps.

With AvroEncoder I'm concerned that folks can easily write data that doesn't conform to its declared schema, resulting in data that cannot be read. We should add warnings to the documentation, starting it with "Expert:" and mentioning that use of ValidatingEncoder is recommended. (For example, DataFileWriter#appendEncoded is labeled "Expert" since it also bypasses type safety.)

So, with some updated javadoc, I'd be happy to commit this. Anyone object?

Doug Cutting
added a comment - 01/Jul/13 22:07 Java's "transient" modifier is already equivalent to AvroIgnore. We might additionally observe Java's Transient annotation. With those, do we also need AvroIgnore? Perhaps.
With AvroEncoder I'm concerned that folks can easily write data that doesn't conform to its declared schema, resulting in data that cannot be read. We should add warnings to the documentation, starting it with "Expert:" and mentioning that use of ValidatingEncoder is recommended. (For example, DataFileWriter#appendEncoded is labeled "Expert" since it also bypasses type safety.)
So, with some updated javadoc, I'd be happy to commit this. Anyone object?

Vincenz Priesnitz
added a comment - 03/Jul/13 17:13 Thanks for reviewing the Patch.
I think it is clearer if there was an extra Ignore Annotation for Avro. Also, the general java Annotations might collide with other Services that use Annotations.
The AvroEncoder Annotation should indded only used by Experts; I will add such warnings.
I'd like to mention Issue AVRO-1347 here, which adds another Annotation for adding Aliases.

Attached is a new patch with more JavaDocs, including warnings for using custom encodings.
I also moved the @AvroAlias annotation from issue AVRO-1347 here, but excluded the controversial writer aliases and added unit tests. It is now also possible to add an alias without a namespace.
This patch still contains the @AvroIgnore annotation, which I would really like to see committed.

Vincenz Priesnitz
added a comment - 04/Jul/13 16:33 Attached is a new patch with more JavaDocs, including warnings for using custom encodings.
I also moved the @AvroAlias annotation from issue AVRO-1347 here, but excluded the controversial writer aliases and added unit tests. It is now also possible to add an alias without a namespace.
This patch still contains the @AvroIgnore annotation, which I would really like to see committed.

Doug Cutting
added a comment - 09/Jul/13 22:14 A few comments on the latest patch:
newly added files are not included in the patch
"" would more naturally represent the top-level namespace (i.e., the empty namespace) and null the default namespace
javadoc comments on non-public classes will not be seen by users so are generally not provided
the 3-line addAlias() code in ReflectData is repeated 3 times, so might go in a new private method.

I modified the defaulting of alias namespaces. That logic is no longer in Schema.java but instead in AvroAlias.java and ReflectData.java. I also added a space to the sentinel value so that it can never conflict with a valid namespace.

Doug Cutting
added a comment - 10/Jul/13 21:56 I modified the defaulting of alias namespaces. That logic is no longer in Schema.java but instead in AvroAlias.java and ReflectData.java. I also added a space to the sentinel value so that it can never conflict with a valid namespace.
Other than that, this looks good.

On second thought, I am now concerned that this could hurt performance. AVRO-1282 did a lot to improve reflect performance and we wouldn't want to harm that.

This patch adds calls to isAnnotationPresent to readField() and writeField(), which are on the critical path.

Can you please run Perf.java with and without this patch? If performance is affected then I suggest we should make isCustomEncoded and isStringable boolean fields in FieldAccessor that are set when the field is constructed so that their access cost in readField() and writeField() is negligible.

Doug Cutting
added a comment - 25/Jul/13 18:40 On second thought, I am now concerned that this could hurt performance. AVRO-1282 did a lot to improve reflect performance and we wouldn't want to harm that.
This patch adds calls to isAnnotationPresent to readField() and writeField(), which are on the critical path.
Can you please run Perf.java with and without this patch? If performance is affected then I suggest we should make isCustomEncoded and isStringable boolean fields in FieldAccessor that are set when the field is constructed so that their access cost in readField() and writeField() is negligible.

Doug Cutting
added a comment - 07/Aug/13 18:26 Here's a version of the patch that preserves the logic path of the fast case, so performance is not affected. It benchmarks equivalently to existing trunk.
I'll commit this soon unless there are objections.

Zhaonan Sun
added a comment - 02/Apr/15 18:22 Looks like @AvroMeta can't add reserved fields, like @AvroMeta("doc", "some doc") will have exceptions.
Do we have a @AvroDoc or something similar to this?