Hallo,
I'm working with avro as the serialization framework for my hadoop map-reduce jobs, and am
emitting GenericRecord/null as my K/V values from my mapper classes. Having looked at the
code, I see that the "key" objects (i.e. my records) are only recognised as being discrete
by my reducer if it sees that the .equals() method called on the record shows a distinction.
However, if the schema is the same (which it is for most of my mappers), then .equals() calls
.compare(), which in turn depends on the ORDER attributes set on the fields. This means that
if I have no sorting defined in my schema, that all records are treated as being equal to
one another. Have I understood this correctly, and if so, is that not a violation of the equals
contract? (for one thing, it would mean GenericRecord objects will often cause confusion when
used with maps and other containers).
Regards,
Andrew