Hi,
27
20
31
Dante Hicks
Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.EOFException
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:184)
at cn.znest.test.avro.AddressBook.browseName(AddressBook.java:91)
at cn.znest.test.avro.AddressBook.main(AddressBook.java:43)
Caused by: java.io.EOFException
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:163)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:262)
at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:93)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:277)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:271)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:83)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:105)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:77)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:70)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:195)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:182)
... 2 more

My code is below. In this example, I create three record: person(3 fileds: First Last Age), age(Age), extract(First Last). The record "age" has the last filed of "Person", so AddressBook.browseAge() will be executed successfully. But the record "extract" does not have the last filed of "Person", so executing AddressBook.browseName() will cause an exception.
In avro/c, read_record (datum_read.c) loops every write_schema fileds.
In avro/java, GenericDatumReader.readRecord loops every read_schema fileds. I think that's the point.

The above may be correct, the loop is over the read schema (expected) rather than actual. However, it works fine if the read schema is at the "end" of the writer schema.

I am not familiar enough with the resolving decoder internals yet to and locate the bug quickly. It is cleanly reproducible.

The trouble is that the ResolvingDecoder does not take care of the trailing field in the underlying BinaryDecoder. So part of the data belonging to the current object is left in the BinaryDecoder. GenericDatumReader constructs a new ResolvingDecoder for the next object. So the leftover integer field is read as a string of the next object.

This problem will not occur if the same ResolvingDecoder is used for all the objects. But that approach requires quite a bit of changes to GenericDatumReader. So I added a new method drain() in ResolvingDecoder, which,if called after reading the entire record as per reader's schema, drains the remaining unused portions.

Thiruvalluvan M. G.
added a comment - 16/Apr/10 10:55 The trouble is that the ResolvingDecoder does not take care of the trailing field in the underlying BinaryDecoder. So part of the data belonging to the current object is left in the BinaryDecoder. GenericDatumReader constructs a new ResolvingDecoder for the next object. So the leftover integer field is read as a string of the next object.
This problem will not occur if the same ResolvingDecoder is used for all the objects. But that approach requires quite a bit of changes to GenericDatumReader. So I added a new method drain() in ResolvingDecoder, which,if called after reading the entire record as per reader's schema, drains the remaining unused portions.