I was thinking I could use LzoJsonLoader for this, but it keeps throwing me errors like:ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo without native-hadoop

This is despite the fact that I can load normal lzos just fine using LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. What should I do to go about loading these files? Does anyone have any ideas?

> Hi,> I'm currently working on trying to load lzos that contain some JSON> elements. This is of the form:>> item1 item2 {'thing1':'1','thing2':'2'}> item3 item4 {'thing3':'1','thing27':'2'}> item5 item6 {'thing5':'1','thing19':'2'}>> I was thinking I could use LzoJsonLoader for this, but it keeps throwing me> errors like:> ERROR com.hadoop.compression.lzo.**LzoCodec - Cannot load native-lzo> without native-hadoop>> This is despite the fact that I can load normal lzos just fine using> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. What should> I do to go about loading these files? Does anyone have any ideas?>> Cheers,> Eli>

java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:508) at org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigContext.java:531) at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncSpec(QueryParser.java:5462) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:5291) at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:5187) at org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:5133) at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:5042) at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:4968) at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:4934) at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:4861) at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:4747) at org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:4704) at org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:4030) at org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3433) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800) etc...

Any ideas? I've verified that it recognizes the function itself, and that the data it's running on is valid json. Not sure what else I can check.

EliOn 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:> They derive from the same classes as far as lzo handling goes, so I suspect> something's up with your environment or inputs if you get LzoTokenizedLoader> to work, but LzoJsonStorage does not.>> Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage.>> JsonLoader wouldn't work for you because it expects the complete input line> to be json, not part of it. You want to load with LzoPigStorage, and then> apply the JsonStringToMap udf to the third field.>> -D>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<[EMAIL PROTECTED]> wrote:>>> Hi,>> I'm currently working on trying to load lzos that contain some JSON>> elements. This is of the form:>>>> item1 item2 {'thing1':'1','thing2':'2'}>> item3 item4 {'thing3':'1','thing27':'2'}>> item5 item6 {'thing5':'1','thing19':'2'}>>>> I was thinking I could use LzoJsonLoader for this, but it keeps throwing me>> errors like:>> ERROR com.hadoop.compression.lzo.**LzoCodec - Cannot load native-lzo>> without native-hadoop>>>> This is despite the fact that I can load normal lzos just fine using>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. What should>> I do to go about loading these files? Does anyone have any ideas?>>>> Cheers,>> Eli>>

> Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, guava-*.jar, and> piggybank.jar, and then trying to use that UDF, but getting the following> error:>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**> ParseException>> java.lang.**NoClassDefFoundError: org/json/simple/parser/**ParseException> at java.lang.Class.forName0(**Native Method)> at java.lang.Class.forName(Class.**java:247)> at org.apache.pig.impl.**PigContext.resolveClassName(**> PigContext.java:426)> at org.apache.pig.impl.**PigContext.**instantiateFuncFromSpec(**> PigContext.java:456)> at org.apache.pig.impl.**PigContext.**instantiateFuncFromSpec(**> PigContext.java:508)> at org.apache.pig.impl.**PigContext.**instantiateFuncFromAlias(**> PigContext.java:531)> at org.apache.pig.impl.**logicalLayer.parser.**> QueryParser.EvalFuncSpec(**QueryParser.java:5462)> at org.apache.pig.impl.**logicalLayer.parser.**> QueryParser.BaseEvalSpec(**QueryParser.java:5291)> at org.apache.pig.impl.**logicalLayer.parser.**> QueryParser.UnaryExpr(**QueryParser.java:5187)> at org.apache.pig.impl.**logicalLayer.parser.**> QueryParser.CastExpr(**QueryParser.java:5133)> at org.apache.pig.impl.**logicalLayer.parser.**QueryParser.**> MultiplicativeExpr(**QueryParser.java:5042)> at org.apache.pig.impl.**logicalLayer.parser.**> QueryParser.AdditiveExpr(**QueryParser.java:4968)> at org.apache.pig.impl.**logicalLayer.parser.**> QueryParser.InfixExpr(**QueryParser.java:4934)> at org.apache.pig.impl.**logicalLayer.parser.**QueryParser.**> FlattenedGenerateItem(**QueryParser.java:4861)> at org.apache.pig.impl.**logicalLayer.parser.**QueryParser.**> FlattenedGenerateItemList(**QueryParser.java:4747)> at org.apache.pig.impl.**logicalLayer.parser.**> QueryParser.GenerateStatement(**QueryParser.java:4704)> at org.apache.pig.impl.**logicalLayer.parser.**> QueryParser.NestedBlock(**QueryParser.java:4030)> at org.apache.pig.impl.**logicalLayer.parser.**> QueryParser.ForEachClause(**QueryParser.java:3433)> at org.apache.pig.impl.**logicalLayer.parser.**> QueryParser.BaseExpr(**QueryParser.java:1464)> at org.apache.pig.impl.**logicalLayer.parser.**> QueryParser.Expr(QueryParser.**java:1013)> at org.apache.pig.impl.**logicalLayer.parser.**> QueryParser.Parse(QueryParser.**java:800)> etc...>> Any ideas? I've verified that it recognizes the function itself, and that> the data it's running on is valid json. Not sure what else I can check.>> Eli>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:>>> They derive from the same classes as far as lzo handling goes, so I>> suspect>> something's up with your environment or inputs if you get>> LzoTokenizedLoader>> to work, but LzoJsonStorage does not.>>>> Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage.>>>> JsonLoader wouldn't work for you because it expects the complete input>> line>> to be json, not part of it. You want to load with LzoPigStorage, and then>> apply the JsonStringToMap udf to the third field.>>>> -D>>>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>>>> wrote:>>>> Hi,>>> I'm currently working on trying to load lzos that contain some JSON>>> elements. This is of the form:>>>>>> item1 item2 {'thing1':'1','thing2':'2'}>>> item3 item4 {'thing3':'1','thing27':'2'}>>> item5 item6 {'thing5':'1','thing19':'2'}>>>>>> I was thinking I could use LzoJsonLoader for this, but it keeps throwing>>> me>>> errors like:>>> ERROR com.hadoop.compression.lzo.****LzoCodec - Cannot load native-lzo>>> without native-hadoop>>>>>> This is despite the fact that I can load normal lzos just fine using>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a standstill. What

On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:> You also want json-simple-1.1.jar>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<[EMAIL PROTECTED]>wrote:>>> Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, guava-*.jar, and>> piggybank.jar, and then trying to use that UDF, but getting the following>> error:>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**>> ParseException>>>> java.lang.**NoClassDefFoundError: org/json/simple/parser/**ParseException>> at java.lang.Class.forName0(**Native Method)>> at java.lang.Class.forName(Class.**java:247)>> at org.apache.pig.impl.**PigContext.resolveClassName(**>> PigContext.java:426)>> at org.apache.pig.impl.**PigContext.**instantiateFuncFromSpec(**>> PigContext.java:456)>> at org.apache.pig.impl.**PigContext.**instantiateFuncFromSpec(**>> PigContext.java:508)>> at org.apache.pig.impl.**PigContext.**instantiateFuncFromAlias(**>> PigContext.java:531)>> at org.apache.pig.impl.**logicalLayer.parser.**>> QueryParser.EvalFuncSpec(**QueryParser.java:5462)>> at org.apache.pig.impl.**logicalLayer.parser.**>> QueryParser.BaseEvalSpec(**QueryParser.java:5291)>> at org.apache.pig.impl.**logicalLayer.parser.**>> QueryParser.UnaryExpr(**QueryParser.java:5187)>> at org.apache.pig.impl.**logicalLayer.parser.**>> QueryParser.CastExpr(**QueryParser.java:5133)>> at org.apache.pig.impl.**logicalLayer.parser.**QueryParser.**>> MultiplicativeExpr(**QueryParser.java:5042)>> at org.apache.pig.impl.**logicalLayer.parser.**>> QueryParser.AdditiveExpr(**QueryParser.java:4968)>> at org.apache.pig.impl.**logicalLayer.parser.**>> QueryParser.InfixExpr(**QueryParser.java:4934)>> at org.apache.pig.impl.**logicalLayer.parser.**QueryParser.**>> FlattenedGenerateItem(**QueryParser.java:4861)>> at org.apache.pig.impl.**logicalLayer.parser.**QueryParser.**>> FlattenedGenerateItemList(**QueryParser.java:4747)>> at org.apache.pig.impl.**logicalLayer.parser.**>> QueryParser.GenerateStatement(**QueryParser.java:4704)>> at org.apache.pig.impl.**logicalLayer.parser.**>> QueryParser.NestedBlock(**QueryParser.java:4030)>> at org.apache.pig.impl.**logicalLayer.parser.**>> QueryParser.ForEachClause(**QueryParser.java:3433)>> at org.apache.pig.impl.**logicalLayer.parser.**>> QueryParser.BaseExpr(**QueryParser.java:1464)>> at org.apache.pig.impl.**logicalLayer.parser.**>> QueryParser.Expr(QueryParser.**java:1013)>> at org.apache.pig.impl.**logicalLayer.parser.**>> QueryParser.Parse(QueryParser.**java:800)>> etc...>>>> Any ideas? I've verified that it recognizes the function itself, and that>> the data it's running on is valid json. Not sure what else I can check.>>>> Eli>>>>>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:>>>>> They derive from the same classes as far as lzo handling goes, so I>>> suspect>>> something's up with your environment or inputs if you get>>> LzoTokenizedLoader>>> to work, but LzoJsonStorage does not.>>>>>> Note that LzoTokenizedLoader is deprecated -- just use LzoPigStorage.>>>>>> JsonLoader wouldn't work for you because it expects the complete input>>> line>>> to be json, not part of it. You want to load with LzoPigStorage, and then>>> apply the JsonStringToMap udf to the third field.>>>>>> -D>>>>>>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>>>>> wrote:>>>>>> Hi,>>>> I'm currently working on trying to load lzos that contain some JSON

> Hmmm, now it gets past my mention of the function, but when I run a dump on> generated information, I get:>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.**Grunt -> ERROR 2997: Unable to recreate exception from backed error:> java.lang.ClassCastException: *org.apache.pig.data.**DataByteArray cannot> be cast to java.lang.String*>> Thanks for all the help so far!>> Eli>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:>>> You also want json-simple-1.1.jar>>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>>> >wrote:>>>> Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, guava-*.jar,>>> and>>> piggybank.jar, and then trying to use that UDF, but getting the following>>> error:>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**>>> ParseException>>>>>> java.lang.****NoClassDefFoundError: org/json/simple/parser/****>>> ParseException>>> at java.lang.Class.forName0(****Native Method)>>> at java.lang.Class.forName(Class.****java:247)>>> at org.apache.pig.impl.****PigContext.resolveClassName(**>>> PigContext.java:426)>>> at org.apache.pig.impl.****PigContext.****>>> instantiateFuncFromSpec(**>>> PigContext.java:456)>>> at org.apache.pig.impl.****PigContext.****>>> instantiateFuncFromSpec(**>>> PigContext.java:508)>>> at org.apache.pig.impl.****PigContext.****>>> instantiateFuncFromAlias(**>>> PigContext.java:531)>>> at org.apache.pig.impl.****logicalLayer.parser.**>>> QueryParser.EvalFuncSpec(****QueryParser.java:5462)>>> at org.apache.pig.impl.****logicalLayer.parser.**>>> QueryParser.BaseEvalSpec(****QueryParser.java:5291)>>> at org.apache.pig.impl.****logicalLayer.parser.**>>> QueryParser.UnaryExpr(****QueryParser.java:5187)>>> at org.apache.pig.impl.****logicalLayer.parser.**>>> QueryParser.CastExpr(****QueryParser.java:5133)>>> at org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**>>> MultiplicativeExpr(****QueryParser.java:5042)>>> at org.apache.pig.impl.****logicalLayer.parser.**>>> QueryParser.AdditiveExpr(****QueryParser.java:4968)>>> at org.apache.pig.impl.****logicalLayer.parser.**>>> QueryParser.InfixExpr(****QueryParser.java:4934)>>> at org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**>>> FlattenedGenerateItem(****QueryParser.java:4861)>>> at org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**>>> FlattenedGenerateItemList(****QueryParser.java:4747)>>> at org.apache.pig.impl.****logicalLayer.parser.**>>> QueryParser.GenerateStatement(****QueryParser.java:4704)>>> at org.apache.pig.impl.****logicalLayer.parser.**>>> QueryParser.NestedBlock(****QueryParser.java:4030)>>> at org.apache.pig.impl.****logicalLayer.parser.**>>> QueryParser.ForEachClause(****QueryParser.java:3433)>>> at org.apache.pig.impl.****logicalLayer.parser.**>>> QueryParser.BaseExpr(****QueryParser.java:1464)>>> at org.apache.pig.impl.****logicalLayer.parser.**>>> QueryParser.Expr(QueryParser.****java:1013)>>> at org.apache.pig.impl.****logicalLayer.parser.**>>> QueryParser.Parse(QueryParser.****java:800)>>> etc...>>>>>> Any ideas? I've verified that it recognizes the function itself, and that>>> the data it's running on is valid json. Not sure what else I can check.>>>>>> Eli>>>>>>>>>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:>>>>>> They derive from the same classes as far as lzo handling goes, so I>>>> suspect>>>> something's up with your environment or inputs if you get

I also tried it without the cast to chararray, but received the same results. Should I be casting json_data as some other data type when I load it initially? Seems by default it's cast to a bytearray when I describe initial. Would that be a problem?

Thanks for all the help so far!

Eli

On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,> theoretically).> The values are bytearrays. You are probably trying to treat them as strings.> You have to do stuff like this:>> x = foreach myrelation generate> (chararray) mymap#'foo' as foo,> (chararray) mymap#'bar' as bar;>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[EMAIL PROTECTED]> wrote:>>> Hmmm, now it gets past my mention of the function, but when I run a dump on>> generated information, I get:>>>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.**Grunt ->> ERROR 2997: Unable to recreate exception from backed error:>> java.lang.ClassCastException: *org.apache.pig.data.**DataByteArray cannot>> be cast to java.lang.String*>>>> Thanks for all the help so far!>>>> Eli>>>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:>>>>> You also want json-simple-1.1.jar>>>>>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>>>>> wrote:>>> Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, guava-*.jar,>>>> and>>>> piggybank.jar, and then trying to use that UDF, but getting the following>>>> error:>>>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**>>>> ParseException>>>>>>>> java.lang.****NoClassDefFoundError: org/json/simple/parser/****>>>> ParseException>>>> at java.lang.Class.forName0(****Native Method)>>>> at java.lang.Class.forName(Class.****java:247)>>>> at org.apache.pig.impl.****PigContext.resolveClassName(**>>>> PigContext.java:426)>>>> at org.apache.pig.impl.****PigContext.****>>>> instantiateFuncFromSpec(**>>>> PigContext.java:456)>>>> at org.apache.pig.impl.****PigContext.****>>>> instantiateFuncFromSpec(**>>>> PigContext.java:508)>>>> at org.apache.pig.impl.****PigContext.****>>>> instantiateFuncFromAlias(**>>>> PigContext.java:531)>>>> at org.apache.pig.impl.****logicalLayer.parser.**>>>> QueryParser.EvalFuncSpec(****QueryParser.java:5462)>>>> at org.apache.pig.impl.****logicalLayer.parser.**>>>> QueryParser.BaseEvalSpec(****QueryParser.java:5291)>>>> at org.apache.pig.impl.****logicalLayer.parser.**>>>> QueryParser.UnaryExpr(****QueryParser.java:5187)>>>> at org.apache.pig.impl.****logicalLayer.parser.**>>>> QueryParser.CastExpr(****QueryParser.java:5133)>>>> at org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**>>>> MultiplicativeExpr(****QueryParser.java:5042)>>>> at org.apache.pig.impl.****logicalLayer.parser.**>>>> QueryParser.AdditiveExpr(****QueryParser.java:4968)>>>> at org.apache.pig.impl.****logicalLayer.parser.**>>>> QueryParser.InfixExpr(****QueryParser.java:4934)>>>> at org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**>>>> FlattenedGenerateItem(****QueryParser.java:4861)>>>> at org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**>>>> FlattenedGenerateItemList(****QueryParser.java:4747)>>>

Correction: I forgot to run the JsonStringToMap function when writing my last email, when I run that, I get the same error as before (*org.apache.pig.data.DataByteArray cannot be cast to java.lang.String*).

On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:> Well, it's not throwing me errors anymore. Now it's just discarding > the field. When I run it on two records where I've verified a field > exists in the json, I get:>> Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 2 time(s).>> More specifically, my json is of the following form:>> {"foo":0,"bar":"hi"}>> On that, I'm running:>> initial = LOAD 'some_file.lzo' USING > com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1, > col2, col3, json_data);> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS > type;> dump extracted;>> Which gives me the above warning along with:>> ()> ()>> I also tried it without the cast to chararray, but received the same > results. Should I be casting json_data as some other data type when I > load it initially? Seems by default it's cast to a bytearray when I > describe initial. Would that be a problem?>> Thanks for all the help so far!>> Eli>>>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,>> theoretically).>> The values are bytearrays. You are probably trying to treat them as >> strings.>> You have to do stuff like this:>>>> x = foreach myrelation generate>> (chararray) mymap#'foo' as foo,>> (chararray) mymap#'bar' as bar;>>>>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[EMAIL PROTECTED]> >> wrote:>>>>> Hmmm, now it gets past my mention of the function, but when I run a >>> dump on>>> generated information, I get:>>>>>> 2011-09-12 14:48:12,814 [main] ERROR >>> org.apache.pig.tools.grunt.**Grunt ->>> ERROR 2997: Unable to recreate exception from backed error:>>> java.lang.ClassCastException: *org.apache.pig.data.**DataByteArray >>> cannot>>> be cast to java.lang.String*>>>>>> Thanks for all the help so far!>>>>>> Eli>>>>>>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:>>>>>>> You also want json-simple-1.1.jar>>>>>>>>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli >>>> Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>>>>>> wrote:>>>> Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, >>>> guava-*.jar,>>>>> and>>>>> piggybank.jar, and then trying to use that UDF, but getting the >>>>> following>>>>> error:>>>>>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**>>>>> ParseException>>>>>>>>>> java.lang.****NoClassDefFoundError: org/json/simple/parser/****>>>>> ParseException>>>>> at java.lang.Class.forName0(****Native Method)>>>>> at java.lang.Class.forName(Class.****java:247)>>>>> at org.apache.pig.impl.****PigContext.resolveClassName(**>>>>> PigContext.java:426)>>>>> at org.apache.pig.impl.****PigContext.****>>>>> instantiateFuncFromSpec(**>>>>> PigContext.java:456)>>>>> at org.apache.pig.impl.****PigContext.****>>>>> instantiateFuncFromSpec(**>>>>> PigContext.java:508)>>>>> at org.apache.pig.impl.****PigContext.****>>>>> instantiateFuncFromAlias(**>>>>> PigContext.java:531)>>>>> at org.apache.pig.impl.****logicalLayer.parser.**>>>>> QueryParser.EvalFuncSpec(****QueryParser.java:5462)>>>>> at org.apache.pig.impl.****logicalLayer.parser.**>>>>> QueryParser.BaseEvalSpec(****QueryParser.java:5291)>>>>> at org.apache.pig.impl.****logicalLayer.parser.**

> Correction: I forgot to run the JsonStringToMap function when writing my> last email, when I run that, I get the same error as before> (*org.apache.pig.data.**DataByteArray cannot be cast to> java.lang.String*).>> My full workflow is as follows:>>> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')> AS (col1, col2, col3, json_data);> map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**> piggybank.JsonStringToMap(**json_data) AS mapped_json_data;> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS> type;> dump extracted;>> Any ideas?>> Eli>>> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:>>> Well, it's not throwing me errors anymore. Now it's just discarding the>> field. When I run it on two records where I've verified a field exists in>> the json, I get:>>>> Encountered Warning FIELD_DISCARDED_TYPE_**CONVERSION_FAILED 2 time(s).>>>> More specifically, my json is of the following form:>>>> {"foo":0,"bar":"hi"}>>>> On that, I'm running:>>>> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')>> AS (col1, col2, col3, json_data);>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS type;>> dump extracted;>>>> Which gives me the above warning along with:>>>> ()>> ()>>>> I also tried it without the cast to chararray, but received the same>> results. Should I be casting json_data as some other data type when I load>> it initially? Seems by default it's cast to a bytearray when I describe>> initial. Would that be a problem?>>>> Thanks for all the help so far!>>>> Eli>>>>>>>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:>>>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,>>> theoretically).>>> The values are bytearrays. You are probably trying to treat them as>>> strings.>>> You have to do stuff like this:>>>>>> x = foreach myrelation generate>>> (chararray) mymap#'foo' as foo,>>> (chararray) mymap#'bar' as bar;>>>>>>>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[EMAIL PROTECTED]>>>> wrote:>>>>>> Hmmm, now it gets past my mention of the function, but when I run a dump>>>> on>>>> generated information, I get:>>>>>>>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.****Grunt>>>> ->>>> ERROR 2997: Unable to recreate exception from backed error:>>>> java.lang.ClassCastException: *org.apache.pig.data.****DataByteArray>>>> cannot>>>> be cast to java.lang.String*>>>>>>>> Thanks for all the help so far!>>>>>>>> Eli>>>>>>>>>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:>>>>>>>> You also want json-simple-1.1.jar>>>>>>>>>>>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.****>>>>> com<[EMAIL PROTECTED]>>>>>>>>>>>> wrote:>>>>>>>>>>> Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar,>>>>> guava-*.jar,>>>>>>>>>>> and>>>>>> piggybank.jar, and then trying to use that UDF, but getting the>>>>>> following>>>>>> error:>>>>>>>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**>>>>>> ParseException>>>>>>>>>>>> java.lang.******NoClassDefFoundError: org/json/simple/parser/****>>>>>> ParseException>>>>>> at java.lang.Class.forName0(******Native Method)>>>>>> at java.lang.Class.forName(Class.******java:247)>>>>>> at org.apache.pig.impl.******PigContext.resolveClassName(**>>>>>> PigContext.java:426)>>>>>> at org.apache.pig.impl.******PigContext.****>>>>>> instantiateFuncFromSpec(**

Sweet! Just got this working! For anyone with the same problem in the future: apparently JsonStringToMap() *does not* like bytearrays. If you simply cast your json as a chararray when you're loading, the error disappears!

Eli

On 9/13/11 11:51 AM, Eli Finkelshteyn wrote:> Correction: I forgot to run the JsonStringToMap function when writing > my last email, when I run that, I get the same error as before > (*org.apache.pig.data.DataByteArray cannot be cast to java.lang.String*).>> My full workflow is as follows:>> initial = LOAD 'some_file.lzo' USING > com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1, > col2, col3, json_data);> map = FOREACH initial GENERATE > com.twitter.elephantbird.pig.piggybank.JsonStringToMap(json_data) AS > mapped_json_data;> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' > AS type;> dump extracted;>> Any ideas?>> Eli>> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:>> Well, it's not throwing me errors anymore. Now it's just discarding >> the field. When I run it on two records where I've verified a field >> exists in the json, I get:>>>> Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 2 time(s).>>>> More specifically, my json is of the following form:>>>> {"foo":0,"bar":"hi"}>>>> On that, I'm running:>>>> initial = LOAD 'some_file.lzo' USING >> com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1, >> col2, col3, json_data);>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS >> type;>> dump extracted;>>>> Which gives me the above warning along with:>>>> ()>> ()>>>> I also tried it without the cast to chararray, but received the same >> results. Should I be casting json_data as some other data type when I >> load it initially? Seems by default it's cast to a bytearray when I >> describe initial. Would that be a problem?>>>> Thanks for all the help so far!>>>> Eli>>>>>>>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,>>> theoretically).>>> The values are bytearrays. You are probably trying to treat them as >>> strings.>>> You have to do stuff like this:>>>>>> x = foreach myrelation generate>>> (chararray) mymap#'foo' as foo,>>> (chararray) mymap#'bar' as bar;>>>>>>>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[EMAIL PROTECTED]> >>> wrote:>>>>>>> Hmmm, now it gets past my mention of the function, but when I run a >>>> dump on>>>> generated information, I get:>>>>>>>> 2011-09-12 14:48:12,814 [main] ERROR >>>> org.apache.pig.tools.grunt.**Grunt ->>>> ERROR 2997: Unable to recreate exception from backed error:>>>> java.lang.ClassCastException: *org.apache.pig.data.**DataByteArray >>>> cannot>>>> be cast to java.lang.String*>>>>>>>> Thanks for all the help so far!>>>>>>>> Eli>>>>>>>>>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:>>>>>>>>> You also want json-simple-1.1.jar>>>>>>>>>>>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli >>>>> Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>>>>>>> wrote:>>>>> Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, >>>>> guava-*.jar,>>>>>> and>>>>>> piggybank.jar, and then trying to use that UDF, but getting the >>>>>> following>>>>>> error:>>>>>>>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**>>>>>> ParseException>>>>>>>>>>>> java.lang.****NoClassDefFoundError: org/json/simple/parser/****>>>>>> ParseException>>>>>> at java.lang.Class.forName0(****Native Method)>>>>>> at java.lang.Class.forName(Class.****java:247)>>>>>> at org.apache.pig.impl.****PigContext.resolveClassName(**>>>>>> PigContext.java:426)>>>>>> at org.apache.pig.impl.****PigContext.****>>>>>> instantiateFuncFromSpec(**>>>>>> PigContext.java:456)>>>>>> at org.apache.pig.impl.****PigContext.****>>>>>> instantiateFuncFromSpec(**>>>>>> PigContext.java:508)>>>>>> at org.apache.pig.impl.****PigContext.****

> Sweet! Just got this working! For anyone with the same problem in the> future: apparently JsonStringToMap() *does not* like bytearrays. If you> simply cast your json as a chararray when you're loading, the error> disappears!>> Eli>>> On 9/13/11 11:51 AM, Eli Finkelshteyn wrote:>>> Correction: I forgot to run the JsonStringToMap function when writing my>> last email, when I run that, I get the same error as before>> (*org.apache.pig.data.**DataByteArray cannot be cast to>> java.lang.String*).>>>> My full workflow is as follows:>>>> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')>> AS (col1, col2, col3, json_data);>> map = FOREACH initial GENERATE com.twitter.elephantbird.pig.**>> piggybank.JsonStringToMap(**json_data) AS mapped_json_data;>> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' AS>> type;>> dump extracted;>>>> Any ideas?>>>> Eli>>>> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:>>>>> Well, it's not throwing me errors anymore. Now it's just discarding the>>> field. When I run it on two records where I've verified a field exists in>>> the json, I get:>>>>>> Encountered Warning FIELD_DISCARDED_TYPE_**CONVERSION_FAILED 2 time(s).>>>>>> More specifically, my json is of the following form:>>>>>> {"foo":0,"bar":"hi"}>>>>>> On that, I'm running:>>>>>> initial = LOAD 'some_file.lzo' USING com.twitter.elephantbird.pig.**store.LzoPigStorage('\\t')>>> AS (col1, col2, col3, json_data);>>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS>>> type;>>> dump extracted;>>>>>> Which gives me the above warning along with:>>>>>> ()>>> ()>>>>>> I also tried it without the cast to chararray, but received the same>>> results. Should I be casting json_data as some other data type when I load>>> it initially? Seems by default it's cast to a bytearray when I describe>>> initial. Would that be a problem?>>>>>> Thanks for all the help so far!>>>>>> Eli>>>>>>>>>>>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:>>>>>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,>>>> theoretically).>>>> The values are bytearrays. You are probably trying to treat them as>>>> strings.>>>> You have to do stuff like this:>>>>>>>> x = foreach myrelation generate>>>> (chararray) mymap#'foo' as foo,>>>> (chararray) mymap#'bar' as bar;>>>>>>>>>>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<[EMAIL PROTECTED]>>>>> wrote:>>>>>>>> Hmmm, now it gets past my mention of the function, but when I run a>>>>> dump on>>>>> generated information, I get:>>>>>>>>>> 2011-09-12 14:48:12,814 [main] ERROR org.apache.pig.tools.grunt.****Grunt>>>>> ->>>>> ERROR 2997: Unable to recreate exception from backed error:>>>>> java.lang.ClassCastException: *org.apache.pig.data.****DataByteArray>>>>> cannot>>>>> be cast to java.lang.String*>>>>>>>>>> Thanks for all the help so far!>>>>>>>>>> Eli>>>>>>>>>>>>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:>>>>>>>>>> You also want json-simple-1.1.jar>>>>>>>>>>>>>>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli Finkelshteyn<iefinkel@gmail.***>>>>>> *com<[EMAIL PROTECTED]>>>>>>>>>>>>>> wrote:>>>>>>>>>>>>> Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar,>>>>>> guava-*.jar,>>>>>>>>>>>>> and>>>>>>> piggybank.jar, and then trying to use that UDF, but getting the>>>>>>> following>>>>>>> error:>>>>>>>>>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**>>>>>>> ParseException>>>>>>>>>>>>>> java.lang.******NoClassDefFoundError: org/json/simple/parser/****>>>>>>> ParseException>>>>>>> at java.lang.Class.forName0(******Native Method)>>>>>>> at java.lang.Class.forName(Class.******java:247)

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext