I am looking into the MR work flow, and want to know more detailsabout the reduce output data copy .

Here is my question.

For the DFSIO test or some other MR jobs. Each reduce task will run on aTT, and generate files to some dirs named like this: "XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also bea result file named part-00000.

After the reducer done the task. the reducer output data part-00000should be moved from the local disk to the HDFS.

My question is: Is that the time that when reducer finish the task thatpart-00000 will be copied to the HDFS? Who make this file copy happen? TheReducer child? The TaskTracker which run the reduce task? Or the JobTracker?

On Fri, May 10, 2013 at 8:49 AM, Ling Kun <[EMAIL PROTECTED]> wrote:> Dear all,>> I am looking into the MR work flow, and want to know more details about> the reduce output data copy .>> Here is my question.>> For the DFSIO test or some other MR jobs. Each reduce task will run on a> TT, and generate files to some dirs named like this: "> XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also be> a result file named part-00000.>> After the reducer done the task. the reducer output data part-00000 should> be moved from the local disk to the HDFS.>> My question is: Is that the time that when reducer finish the task that> part-00000 will be copied to the HDFS? Who make this file copy happen? The> Reducer child? The TaskTracker which run the reduce task? Or the JobTracker?>> Thanks,>> yours,> Kun Ling>> --> http://www.lingcc.com

-- Harsh J

+

Harsh J 2013-05-10, 05:26

-

Re: When and who move the reduce output file part-0000X to the final output directory

> The task itself moves it when it receives a commitTask message. See> the OutputCommitter class:>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/OutputCommitter.html#commitTask(org.apache.hadoop.mapred.TaskAttemptContext)>> On Fri, May 10, 2013 at 8:49 AM, Ling Kun <[EMAIL PROTECTED]> wrote:> > Dear all,> >> > I am looking into the MR work flow, and want to know more details> about> > the reduce output data copy .> >> > Here is my question.> >> > For the DFSIO test or some other MR jobs. Each reduce task will run> on a> > TT, and generate files to some dirs named like this: "> > XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also> be> > a result file named part-00000.> >> > After the reducer done the task. the reducer output data part-00000> should> > be moved from the local disk to the HDFS.> >> > My question is: Is that the time that when reducer finish the task that> > part-00000 will be copied to the HDFS? Who make this file copy happen?> The> > Reducer child? The TaskTracker which run the reduce task? Or the> JobTracker?> >> > Thanks,> >> > yours,> > Kun Ling> >> > --> > http://www.lingcc.com>>>> --> Harsh J>