We're using Avro as the storage format for database records, and schemaevolution is a key feature for us. I have a question regarding thedeletion of fields from a record, when a schema is changed.

Let's say a field X that is present in v1 of the schema, but does notdefine a default value, is deleted in v2 of the schema. There can be a mixof v1 and v2 records in the database, and a mix of v1 and v2 client apps(apps that use v1 or v2 as their writer and reader schema).

If a v1 app reads a v2 record (written by a v2 app), an exception will bethrown because the reader schema contains field X, the record beingdeserialized does not contain field X, and the reader schema does notcontain a default value for field X.

Therefore, our conclusion is that a default value must be defined for eachfield in a schema, in order to support deletion of that field from theschema at a future time.

To delete a field that does not define a default value, the onlypossibility would be to upgrade all clients to v2 before using the v2schema for writing. This is usually impractical in a large distributedsystem.

Yes, I think you have this right. If you ever wish to delete a fieldfrom a record and maintain both forward and backward compatibilitythen you should specify a default value for that field. Similarly, ifyou add a field then you should supply a default value so that you canread old data that does not contain that field using the new schema.

Doug

On Mon, Oct 1, 2012 at 9:28 AM, Mark Hayes <[EMAIL PROTECTED]> wrote:> Hi,>> We're using Avro as the storage format for database records, and schema> evolution is a key feature for us. I have a question regarding the deletion> of fields from a record, when a schema is changed.>> Let's say a field X that is present in v1 of the schema, but does not define> a default value, is deleted in v2 of the schema. There can be a mix of v1> and v2 records in the database, and a mix of v1 and v2 client apps (apps> that use v1 or v2 as their writer and reader schema).>> If a v1 app reads a v2 record (written by a v2 app), an exception will be> thrown because the reader schema contains field X, the record being> deserialized does not contain field X, and the reader schema does not> contain a default value for field X.>> Therefore, our conclusion is that a default value must be defined for each> field in a schema, in order to support deletion of that field from the> schema at a future time.>> To delete a field that does not define a default value, the only possibility> would be to upgrade all clients to v2 before using the v2 schema for> writing. This is usually impractical in a large distributed system.>> My question is: Does this make sense -- have I got it right?>> Thanks in advance,> --mark>