I have very small input size (kB), but processing to produce some output takes several minutes. Is there a way how to say, file has 100 lines, i need 10 mappers, where each mapper node has to process 10 lines of input file?

I have very small input size (kB), but processing to produce some output takes several minutes. Is there a way how to say, file has 100 lines, i need 10 mappers, where each mapper node has to process 10 lines of input file?

I tried this approach, but the job is not distributed among 10 mapper nodes. Seems Hadoop ignores this property :(

My first thought is, that the small file size is the problem and Hadoop doesn't care about it's splitting in proper way.

Thanks any ideas.On 06/16/2012 11:27 AM, Bejoy KS wrote:> Hi Ondrej>> You can use NLineInputFormat with n set to 10.>> ------Original Message------> From: Ondřej Klimpera> To: [EMAIL PROTECTED]> ReplyTo: [EMAIL PROTECTED]> Subject: Setting number of mappers according to number of TextInput lines> Sent: Jun 16, 2012 14:31>> Hello,>> I have very small input size (kB), but processing to produce some output> takes several minutes. Is there a way how to say, file has 100 lines, i> need 10 mappers, where each mapper node has to process 10 lines of input> file?>> Thanks for advice.> Ondrej Klimpera>>> Regards> Bejoy KS>> Sent from handheld, please excuse typos.>

No. The number of lines is not known at planning time. All you know isthe size of the blocks. You want to look at mapred.max.split.size .

On Sat, Jun 16, 2012 at 5:31 AM, Ondřej Klimpera <[EMAIL PROTECTED]> wrote:> I tried this approach, but the job is not distributed among 10 mapper nodes.> Seems Hadoop ignores this property :(>> My first thought is, that the small file size is the problem and Hadoop> doesn't care about it's splitting in proper way.>> Thanks any ideas.>>> On 06/16/2012 11:27 AM, Bejoy KS wrote:>>>> Hi Ondrej>>>> You can use NLineInputFormat with n set to 10.>>>> ------Original Message------>> From: Ondřej Klimpera>> To: [EMAIL PROTECTED]>> ReplyTo: [EMAIL PROTECTED]>> Subject: Setting number of mappers according to number of TextInput lines>> Sent: Jun 16, 2012 14:31>>>> Hello,>>>> I have very small input size (kB), but processing to produce some output>> takes several minutes. Is there a way how to say, file has 100 lines, i>> need 10 mappers, where each mapper node has to process 10 lines of input>> file?>>>> Thanks for advice.>> Ondrej Klimpera>>>>>> Regards>> Bejoy KS>>>> Sent from handheld, please excuse typos.>>>

While NLineInputFormat will indeed give you N lines per task, it doesnot guarantee that the N map tasks that come out for a file from itwill all be sent to different nodes. Which one is your need exactly -Simply having N lines per map task, or N wider distributed maps?

On Sat, Jun 16, 2012 at 3:01 PM, Ondřej Klimpera <[EMAIL PROTECTED]> wrote:> I tried this approach, but the job is not distributed among 10 mapper nodes.> Seems Hadoop ignores this property :(>> My first thought is, that the small file size is the problem and Hadoop> doesn't care about it's splitting in proper way.>> Thanks any ideas.>>>> On 06/16/2012 11:27 AM, Bejoy KS wrote:>>>> Hi Ondrej>>>> You can use NLineInputFormat with n set to 10.>>>> ------Original Message------>> From: Ondřej Klimpera>> To: [EMAIL PROTECTED]>> ReplyTo: [EMAIL PROTECTED]>> Subject: Setting number of mappers according to number of TextInput lines>> Sent: Jun 16, 2012 14:31>>>> Hello,>>>> I have very small input size (kB), but processing to produce some output>> takes several minutes. Is there a way how to say, file has 100 lines, i>> need 10 mappers, where each mapper node has to process 10 lines of input>> file?>>>> Thanks for advice.>> Ondrej Klimpera>>>>>> Regards>> Bejoy KS>>>> Sent from handheld, please excuse typos.>>>

Hi, I made some progress, combination of NLineInputFormat and mapre.max.split.size seems to work, but it is hard to exactly set the byte value. Input lines have from 64 to 1024 bytes approx.

What I need is having as much mappers as possible (use full potential of the cluster), where each receives N input lines.On 06/17/2012 05:02 AM, Harsh J wrote:> Ondřej,>> While NLineInputFormat will indeed give you N lines per task, it does> not guarantee that the N map tasks that come out for a file from it> will all be sent to different nodes. Which one is your need exactly -> Simply having N lines per map task, or N wider distributed maps?>> On Sat, Jun 16, 2012 at 3:01 PM, Ondřej Klimpera<[EMAIL PROTECTED]> wrote:>> I tried this approach, but the job is not distributed among 10 mapper nodes.>> Seems Hadoop ignores this property :(>>>> My first thought is, that the small file size is the problem and Hadoop>> doesn't care about it's splitting in proper way.>>>> Thanks any ideas.>>>>>>>> On 06/16/2012 11:27 AM, Bejoy KS wrote:>>> Hi Ondrej>>>>>> You can use NLineInputFormat with n set to 10.>>>>>> ------Original Message------>>> From: Ondřej Klimpera>>> To: [EMAIL PROTECTED]>>> ReplyTo: [EMAIL PROTECTED]>>> Subject: Setting number of mappers according to number of TextInput lines>>> Sent: Jun 16, 2012 14:31>>>>>> Hello,>>>>>> I have very small input size (kB), but processing to produce some output>>> takes several minutes. Is there a way how to say, file has 100 lines, i>>> need 10 mappers, where each mapper node has to process 10 lines of input>>> file?>>>>>> Thanks for advice.>>> Ondrej Klimpera>>>>>>>>> Regards>>> Bejoy KS>>>>>> Sent from handheld, please excuse typos.>>>>>

> Hi, I made some progress, combination of NLineInputFormat and> mapre.max.split.size seems to work, but it is hard to exactly set the byte> value. Input lines have from 64 to 1024 bytes approx.>> What I need is having as much mappers as possible (use full potential of> the cluster), where each receives N input lines.>>>> On 06/17/2012 05:02 AM, Harsh J wrote:>>> Ondřej,>>>> While NLineInputFormat will indeed give you N lines per task, it does>> not guarantee that the N map tasks that come out for a file from it>> will all be sent to different nodes. Which one is your need exactly ->> Simply having N lines per map task, or N wider distributed maps?>>>> On Sat, Jun 16, 2012 at 3:01 PM, Ondřej Klimpera<[EMAIL PROTECTED]>>> wrote:>>>>> I tried this approach, but the job is not distributed among 10 mapper>>> nodes.>>> Seems Hadoop ignores this property :(>>>>>> My first thought is, that the small file size is the problem and Hadoop>>> doesn't care about it's splitting in proper way.>>>>>> Thanks any ideas.>>>>>>>>>>>> On 06/16/2012 11:27 AM, Bejoy KS wrote:>>>>>>> Hi Ondrej>>>>>>>> You can use NLineInputFormat with n set to 10.>>>>>>>> ------Original Message------>>>> From: Ondřej Klimpera>>>> To: [EMAIL PROTECTED]>>>> ReplyTo: [EMAIL PROTECTED]>>>> Subject: Setting number of mappers according to number of TextInput>>>> lines>>>> Sent: Jun 16, 2012 14:31>>>>>>>> Hello,>>>>>>>> I have very small input size (kB), but processing to produce some output>>>> takes several minutes. Is there a way how to say, file has 100 lines, i>>>> need 10 mappers, where each mapper node has to process 10 lines of input>>>> file?>>>>>>>> Thanks for advice.>>>> Ondrej Klimpera>>>>>>>>>>>> Regards>>>> Bejoy KS>>>>>>>> Sent from handheld, please excuse typos.>>>>>>>>>>>>>--

Thanks & Regards

Sachin Aggarwal7760502772

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext