Is there a reason Avro’s Hadoop serialization classes don’t allowconfiguration of the DatumReader and DatumWriter classes?

My use-case is that I’m implementing Clojure DatumReader and -Writerclasses which produce and consume Clojure’s data structures directly.I’d like to then extend that to Hadoop MapReduce jobs which operate interms of Clojure data, with Avro handling all de/serialization directlyto/from that Clojure data.

Am I going around this in a backwards fashion, or would a patch to allowconfiguration of the Hadoop serialization DatumReader/Writers bewelcome?

Making the DatumReader/Writers configurable would be a welcome addition.

Ideally, much more of what goes on there could be: 1. configuration driven 2. pre-computed to avoid repeated work during decoding/encoding

We do some of both already. The trick is to do #1 without impactingperformance and #2 requires a bigger overhaul.

If you would like, a contribution including a Clojure related maven moduleor two that depends on the Java stuff would be a welcome addition andallow us to identify compatibility issues as we change the Java libraryover time.On 5/8/13 3:33 PM, "Marshall Bockrath-Vandegrift" <[EMAIL PROTECTED]>wrote:

>Hi all:>>Is there a reason Avro¹s Hadoop serialization classes don¹t allow>configuration of the DatumReader and DatumWriter classes?>>My use-case is that I¹m implementing Clojure DatumReader and -Writer>classes which produce and consume Clojure¹s data structures directly.>I¹d like to then extend that to Hadoop MapReduce jobs which operate in>terms of Clojure data, with Avro handling all de/serialization directly>to/from that Clojure data.>>Am I going around this in a backwards fashion, or would a patch to allow>configuration of the Hadoop serialization DatumReader/Writers be>welcome?>>-Marshall>

> Making the DatumReader/Writers configurable would be a welcome> addition.

Excellent!

> Ideally, much more of what goes on there could be:> 1. configuration driven> 2. pre-computed to avoid repeated work during decoding/encoding>> We do some of both already. The trick is to do #1 without impacting> performance and #2 requires a bigger overhaul.

Which work in particular? In my pass through the AvroSerializationimplementation so far, it looks like each MR task would create eitherone or two Serializers/Deserializers (key and value), each of which inturn would create one DatumWriter/DatumReader and Encoder/Decoder pair.Or do De/Serializers get created multiple times per task?

> If you would like, a contribution including a Clojure related maven> module or two that depends on the Java stuff would be a welcome> addition and allow us to identify compatibility issues as we change> the Java library over time.

That sounds like a great end-goal. Right now at the company I work for(Damballa) we've just started getting our toes wet with Avro. Avro wonour serialization-format bake-off, but we haven't started actually usingit. I just finished an initial pass at Avro-Clojure integration and wehave released it under an open source license: