I am writing unit test but I had a doubt. My understanding is that completerecord is a tuple. So record "a b{(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx XXxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55 ),(MARITAL:Married)}"which is one line in a file is a tuple? But I somehow feel it's not right.Could someone please clarify?

Below is the code, my test is incomplete but just pasting it to show how Iam constructing this tuple. TupleFactory mTupleFactory = TupleFactory.getInstance(); BagFactory mBagFactory = BagFactory.getInstance();

Additional details, when I try to build a tuple from a line in a file usingbelow code from my previous email I get . Looks like I need to defineschema somehow. I wonder how others test this. I am trying to test udf andI need to pass a line from a file, build a tuple and pass it to eval func.

> I am writing unit test but I had a doubt. My understanding is that> complete record is a tuple. So record "a b> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X> Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55 ),(MARITAL:Married)}"> which is one line in a file is a tuple? But I somehow feel it's not right.> Could someone please clarify?>> Below is the code, my test is incomplete but just pasting it to show how I> am constructing this tuple.>>> TupleFactory mTupleFactory = TupleFactory.getInstance();> BagFactory mBagFactory = BagFactory.getInstance();>> @Test> public void evalFuncTest() throws IOException{> String record = "a b> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X> Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55 ),(MARITAL:Married)}";> Tuple t = mTupleFactory.newTuple();> DataInput in = new DataInputStream(new> ByteArrayInputStream(record.getBytes()));> t.readFields(in);> }>

> I am writing unit test but I had a doubt. My understanding is that> complete record is a tuple. So record "a b> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X> Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55 ),(MARITAL:Married)}"> which is one line in a file is a tuple? But I somehow feel it's not right.> Could someone please clarify?>> Below is the code, my test is incomplete but just pasting it to show how I> am constructing this tuple.>>> TupleFactory mTupleFactory = TupleFactory.getInstance();> BagFactory mBagFactory = BagFactory.getInstance();>> @Test> public void evalFuncTest() throws IOException{> String record = "a b> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X> Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55 ),(MARITAL:Married)}";> Tuple t = mTupleFactory.newTuple();> DataInput in = new DataInputStream(new> ByteArrayInputStream(record.getBytes()));> t.readFields(in);> }>

You are trying to read a string that represents a tuple using binarydeserialization.

Pig has an abstraction called LoadFunc that knows how to read data offdisk and turn it into tuples (yes, records are tuples). PigStorage isone such LoadFunc, and it reads data represented as strings such aswhat you are trying to feed in. There are other load funcs that knowhow to read other serializations and interpret the data in verydifferent ways (json, avro, thrift, records from a database, xml...).There is no way for Tuple.readFields to know what format you aretrying to feed into it. Tuples serialization is used for intermediateserialization between MR jobs and is not intended for the end-user.

You should be using the appropriate LoadFunc to create tuples(PigStorage in this case?), or create them in code as I demonstratedearlier.

> You are trying to read a string that represents a tuple using binary> deserialization.>> Pig has an abstraction called LoadFunc that knows how to read data off> disk and turn it into tuples (yes, records are tuples). PigStorage is> one such LoadFunc, and it reads data represented as strings such as> what you are trying to feed in. There are other load funcs that know> how to read other serializations and interpret the data in very> different ways (json, avro, thrift, records from a database, xml...).> There is no way for Tuple.readFields to know what format you are> trying to feed into it. Tuples serialization is used for intermediate> serialization between MR jobs and is not intended for the end-user.>> You should be using the appropriate LoadFunc to create tuples> (PigStorage in this case?), or create them in code as I demonstrated> earlier.>> You might find ReadToEndLoader, which wraps a real loadfunc and helps> with some details of instantiating input formats, getting splits, etc,> helpful:> http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/impl/io/ReadToEndLoader.html>> But really, you should just create the tuples you want in code rather> than involve all of this machinery.>> D>>> On Sun, Apr 22, 2012 at 9:56 AM, Mohit Anchlia <[EMAIL PROTECTED]>> wrote:> > Could someone help mw answer this question if records (each line) => tuples?> >> > On Fri, Apr 20, 2012 at 4:22 PM, Mohit Anchlia <[EMAIL PROTECTED]> >wrote:> >> >> I am writing unit test but I had a doubt. My understanding is that> >> complete record is a tuple. So record "a b> >> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X> >> Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55 ),(MARITAL:Married)}"> >> which is one line in a file is a tuple? But I somehow feel it's not> right.> >> Could someone please clarify?> >>> >> Below is the code, my test is incomplete but just pasting it to show> how I> >> am constructing this tuple.> >>> >>> >> TupleFactory mTupleFactory = TupleFactory.getInstance();> >> BagFactory mBagFactory = BagFactory.getInstance();> >>> >> @Test> >> public void evalFuncTest() throws IOException{> >> String record = "a b> >> {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X> >> Xxxxxx)} {(OCCUP:xxxxxxx xxxxx),(AGE:55 ),(MARITAL:Married)}";> >> Tuple t = mTupleFactory.newTuple();> >> DataInput in = new DataInputStream(new> >> ByteArrayInputStream(record.getBytes()));> >> t.readFields(in);> >> }> >>>

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext