I have another question: The Data I'm working with is stored as AVROFiles in the Hadoop. When I try to glob them everything works justperfectly. But. When I add the schema of a single data file while theothers remain everything gets wrecked:

"currently we assume all avro files under the same "location" * share the same schema and will throw exception if not."

(e.g. I add a new data field) Expected behavior for me would be: If I'mglobbing several files with slightly different schema the result of theLOAD would be either return an intersection of all valid fields that arecommon to both schemes or the atoms of the missing fields are nulled.

I'm assuming you are using Pig's AvroStorage function. It appears that itdoes not support schema migration, but it certainly could do so. Acollection of avro files can be 'viewed' as if they all are of one schemaprovided they can all resolve to it. I have several tools that do thissuccessfully with MapReduce/Pig/Hive.

The Pig AvroStorage tool is maintained by the Apache Pig project, you willneed to inquire there in order to get more details.

-Scott

On 3/20/12 2:27 AM, "Markus Resch" <[EMAIL PROTECTED]> wrote:

>Hi guys,>>Thanks again for your awesome hint about sqoop.>>I have another question: The Data I'm working with is stored as AVRO>Files in the Hadoop. When I try to glob them everything works just>perfectly. But. When I add the schema of a single data file while the>others remain everything gets wrecked:>>"currently we assume all avro files under the same "location"> * share the same schema and will throw exception if not.">>(e.g. I add a new data field) Expected behavior for me would be: If I'm>globbing several files with slightly different schema the result of the>LOAD would be either return an intersection of all valid fields that are>common to both schemes or the atoms of the missing fields are nulled.>>How could I handle this properly?>>Thanks >>Markus>>>>

> I'm assuming you are using Pig's AvroStorage function. It appears that it> does not support schema migration, but it certainly could do so. A> collection of avro files can be 'viewed' as if they all are of one schema> provided they can all resolve to it. I have several tools that do this> successfully with MapReduce/Pig/Hive.>> The Pig AvroStorage tool is maintained by the Apache Pig project, you will> need to inquire there in order to get more details.>> -Scott>>>> On 3/20/12 2:27 AM, "Markus Resch" <[EMAIL PROTECTED]> wrote:>> >Hi guys,> >> >Thanks again for your awesome hint about sqoop.> >> >I have another question: The Data I'm working with is stored as AVRO> >Files in the Hadoop. When I try to glob them everything works just> >perfectly. But. When I add the schema of a single data file while the> >others remain everything gets wrecked:> >> >"currently we assume all avro files under the same "location"> > * share the same schema and will throw exception if not."> >> >(e.g. I add a new data field) Expected behavior for me would be: If I'm> >globbing several files with slightly different schema the result of the> >LOAD would be either return an intersection of all valid fields that are> >common to both schemes or the atoms of the missing fields are nulled.> >> >How could I handle this properly?> >> >Thanks> >> >Markus> >> >> >> >>>>-- Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext