I am running Hadoop 1.0.3 in Pseudo distributed mode. When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following mapred.map.tasks =242mapred.min.split.size =0dfs.block.size = 67108864I would like to reduce mapred.map.tasks to see if it improves performance. I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged. Is there a way to reduce mapred.map.tasks ? Thanks in advance for any assistance ! Shing

>>>> I am running Hadoop 1.0.3 in Pseudo distributed mode.> When I submit a map/reduce job to process a file of size about 16 GB, in> job.xml, I have the following>>> mapred.map.tasks =242> mapred.min.split.size =0> dfs.block.size = 67108864>>> I would like to reduce mapred.map.tasks to see if it improves> performance.> I have tried doubling the size of dfs.block.size. But> the mapred.map.tasks remains unchanged.> Is there a way to reduce mapred.map.tasks ?>>> Thanks in advance for any assistance !> Shing>>

Also just for changing the number of map tasks you don't need to modify thehdfs block size.

On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:

> Hi>> You need to alter the value of mapred.max.split size to a value larger> than your block size to have less number of map tasks than the default.>>> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <[EMAIL PROTECTED]> wrote:>>>>>>>>> I am running Hadoop 1.0.3 in Pseudo distributed mode.>> When I submit a map/reduce job to process a file of size about 16 GB,>> in job.xml, I have the following>>>>>> mapred.map.tasks =242>> mapred.min.split.size =0>> dfs.block.size = 67108864>>>>>> I would like to reduce mapred.map.tasks to see if it improves>> performance.>> I have tried doubling the size of dfs.block.size. But>> the mapred.map.tasks remains unchanged.>> Is there a way to reduce mapred.map.tasks ?>>>>>> Thanks in advance for any assistance !>> Shing>>>>>

Also just for changing the number of map tasks you don't need to modify the hdfs block size.On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:

Hi>>>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.>>>>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <[EMAIL PROTECTED]> wrote:>>>>>>>>I am running Hadoop 1.0.3 in Pseudo distributed mode.>>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following>>>>>>mapred.map.tasks =242>>mapred.min.split.size =0>>dfs.block.size = 67108864>>>>>>I would like to reduce mapred.map.tasks to see if it improves performance.>>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.>>Is there a way to reduce mapred.map.tasks ?>>>>>>Thanks in advance for any assistance ! >>Shing>>>>>

Also just for changing the number of map tasks you don't need to modify the hdfs block size.On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:

Hi>>>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.>>>>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <[EMAIL PROTECTED]> wrote:>>>>>>>>I am running Hadoop 1.0.3 in Pseudo distributed mode.>>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following>>>>>>mapred.map.tasks =242>>mapred.min.split.size =0>>dfs.block.size = 67108864>>>>>>I would like to reduce mapred.map.tasks to see if it improves performance.>>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.>>Is there a way to reduce mapred.map.tasks ?>>>>>>Thanks in advance for any assistance ! >>Shing>>>>>

When you doubled dfs.block.size, how did you accomplish that? Typically,the block size is selected at file write time, with a default value fromsystem configuration used if not specified. Did you "hadoop fs -put" thefile with the new block size, or was it something else?

>>>> I am running Hadoop 1.0.3 in Pseudo distributed mode.> When I submit a map/reduce job to process a file of size about 16 GB, in> job.xml, I have the following>>> mapred.map.tasks =242> mapred.min.split.size =0> dfs.block.size = 67108864>>> I would like to reduce mapred.map.tasks to see if it improves> performance.> I have tried doubling the size of dfs.block.size. But> the mapred.map.tasks remains unchanged.> Is there a way to reduce mapred.map.tasks ?>>> Thanks in advance for any assistance !> Shing>>

When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,--ChrisOn Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <[EMAIL PROTECTED]> wrote:>>>I am running Hadoop 1.0.3 in Pseudo distributed mode.>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following>>>mapred.map.tasks =242>mapred.min.split.size =0>dfs.block.size = 67108864>>>I would like to reduce mapred.map.tasks to see if it improves performance.>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.>Is there a way to reduce mapred.map.tasks ?>>>Thanks in advance for any assistance ! >Shing>>

This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.hadoop fs -cp src destn.RegardsBejoy KS

When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,--ChrisOn Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <[EMAIL PROTECTED]> wrote:>>>I am running Hadoop 1.0.3 in Pseudo distributed mode.>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following>>>mapred.map.tasks =242>mapred.min.split.size =0>dfs.block.size = 67108864>>>I would like to reduce mapred.map.tasks to see if it improves performance.>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.>Is there a way to reduce mapred.map.tasks ?>>>Thanks in advance for any assistance ! >Shing>>

This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.hadoop fs -cp src destn.RegardsBejoy KS

Sent from handheld, please excuse typos.________________________________

When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,--ChrisOn Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <[EMAIL PROTECTED]> wrote:>>>I am running Hadoop 1.0.3 in Pseudo distributed mode.>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following>>>mapred.map.tasks =242>mapred.min.split.size =0>dfs.block.size = 67108864>>>I would like to reduce mapred.map.tasks to see if it improves performance.>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.>Is there a way to reduce mapred.map.tasks ?>>>Thanks in advance for any assistance ! >Shing>>

+

Shing Hing Man 2012-10-02, 18:17

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext