I can call listdir to read from local filesystem in a python UDF. Did you implement your function as a proper UDF?________________________________________From: Haider [[EMAIL PROTECTED]]Sent: Monday, December 02, 2013 5:22 AMTo: [EMAIL PROTECTED]Subject: listdir() python function is not wokring on hadoop

Hi all

is there any one who successfully used listdir() function to retrievefiles one by one from HDFS using python script. if __name__ == '__main__':

I am trying to read from HDFS not from Local file system, so would it bepossible through listdir? or is there any way to read hdfs files one by oneand passing to one funtion.On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih<[EMAIL PROTECTED]>wrote:

> I can call listdir to read from local filesystem in a python UDF. Did you> implement your function as a proper UDF?> ________________________________________> From: Haider [[EMAIL PROTECTED]]> Sent: Monday, December 02, 2013 5:22 AM> To: [EMAIL PROTECTED]> Subject: listdir() python function is not wokring on hadoop>> Hi all>> is there any one who successfully used listdir() function to retrieve> files one by one from HDFS using python script.>>> if __name__ == '__main__':>> for filename in os.listdir("/user/hdmaster/XML2"):> print filename>> ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks> exceeded allowed limit. FailedCount: 1. LastFailedTask:> task_201312020139_0025_m_000000> 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...>> My intention is to take files one by one to parse.>> Any help or suggestion on this will be so much helpful to me>> Thanks> Haider>

Haidar, you can not use python system level functions on hadoop directly.You may want to take a look at PyDoop project if you want those featuresOn Fri, Dec 6, 2013 at 2:22 PM, shashwat shriparv <[EMAIL PROTECTED]> wrote:

I am trying to read from HDFS not from Local file system, so would it be possible through listdir? or is there any way to read hdfs files one by one and passing to one funtion.On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih<[EMAIL PROTECTED]>wrote:

> I can call listdir to read from local filesystem in a python UDF. Did > you implement your function as a proper UDF?> ________________________________________> From: Haider [[EMAIL PROTECTED]]> Sent: Monday, December 02, 2013 5:22 AM> To: [EMAIL PROTECTED]> Subject: listdir() python function is not wokring on hadoop>> Hi all>> is there any one who successfully used listdir() function to > retrieve files one by one from HDFS using python script.>>> if __name__ == '__main__':>> for filename in os.listdir("/user/hdmaster/XML2"):> print filename>> ERROR streaming.StreamJob: Job not successful. Error: # of failed Map > Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:> task_201312020139_0025_m_000000> 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...>> My intention is to take files one by one to parse.>> Any help or suggestion on this will be so much helpful to me>> Thanks> Haider>

Thanks for you suggestionsBut in my case I have thousands small files and I want read them one byone.I think it is only possible by using listdir().As per Nitin comment I tried to install Pydoop but it is throwing me somestrange error and I am not finding any inforamtion on pydoop on google.

> Haider,> You can use TextLoader to read a file in HDFS line by line, and then you> can pass those lines to your python UDF. Something like the following> should work:>> x = load '/tmp/my_file_on_hdfs' using TextLoader() as (line:chararray);> y = foreach x generate my_udf(line);>> -----Original Message-----> From: Haider [mailto:[EMAIL PROTECTED]]> Sent: Thursday, December 5, 2013 10:12 PM> To: [EMAIL PROTECTED]> Subject: Re: listdir() python function is not wokring on hadoop>> I am trying to read from HDFS not from Local file system, so would it be> possible through listdir? or is there any way to read hdfs files one by one> and passing to one funtion.>>>>> On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih> <[EMAIL PROTECTED]>wrote:>> > I can call listdir to read from local filesystem in a python UDF. Did> > you implement your function as a proper UDF?> > ________________________________________> > From: Haider [[EMAIL PROTECTED]]> > Sent: Monday, December 02, 2013 5:22 AM> > To: [EMAIL PROTECTED]> > Subject: listdir() python function is not wokring on hadoop> >> > Hi all> >> > is there any one who successfully used listdir() function to> > retrieve files one by one from HDFS using python script.> >> >> > if __name__ == '__main__':> >> > for filename in os.listdir("/user/hdmaster/XML2"):> > print filename> >> > ERROR streaming.StreamJob: Job not successful. Error: # of failed Map> > Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:> > task_201312020139_0025_m_000000> > 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...> >> > My intention is to take files one by one to parse.> >> > Any help or suggestion on this will be so much helpful to me> >> > Thanks> > Haider> >>