Sure you can, as we provide pluggable code points via the API. Just write acustom record reader that doubles the work (first round reads actual input,second round reads your known output and reiterates). In the mapper,separate the first and second logic via a flag.

Thanks for the fast reply, but I don't see how a custom record reader willhelp.Consider again the k-means: the mappers need to stand-by until all thereducers finish to calculate the new clusters' center. Only then, after thereducers finish their work, the stand-by mappers get back to life andperform their work.

On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Sure you can, as we provide pluggable code points via the API. Just write> a custom record reader that doubles the work (first round reads actual> input, second round reads your known output and reiterates). In the mapper,> separate the first and second logic via a flag.>>> On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <[EMAIL PROTECTED]> wrote:>>> Hi,>> Is there a way to keep a map-task alive after it has finished its work,>> to later perform another task on its same input?>> For example, consider the k-means clustering algorithm (k-means>> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop>> implementation<http://codingwiththomas.blogspot.co.il/2011/05/k-means-clustering-with-mapreduce.html>).>> The only thing changing between iterations is the clusters centers. All the>> input points remain the same. Keeping the mapper alive, and performing the>> next round of map-tasks on the same node will save a lot of communication>> cost.>>>> Thanks,>> Yaron>>>>>> --> Harsh J>

There currently isn't a way to do this with the existing MR framework andAPIs. A Reducer is initiated upon map completion and the Task JVM is cannedaway after the Maps end. Perhaps you can use YARN to write something ofwhat you desire?

> Thanks for the fast reply, but I don't see how a custom record reader will> help.> Consider again the k-means: the mappers need to stand-by until all the> reducers finish to calculate the new clusters' center. Only then, after the> reducers finish their work, the stand-by mappers get back to life and> perform their work.>>> On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <[EMAIL PROTECTED]> wrote:>>> Sure you can, as we provide pluggable code points via the API. Just write>> a custom record reader that doubles the work (first round reads actual>> input, second round reads your known output and reiterates). In the mapper,>> separate the first and second logic via a flag.>>>>>> On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <[EMAIL PROTECTED]>wrote:>>>>> Hi,>>> Is there a way to keep a map-task alive after it has finished its work,>>> to later perform another task on its same input?>>> For example, consider the k-means clustering algorithm (k-means>>> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop>>> implementation<http://codingwiththomas.blogspot.co.il/2011/05/k-means-clustering-with-mapreduce.html>).>>> The only thing changing between iterations is the clusters centers. All the>>> input points remain the same. Keeping the mapper alive, and performing the>>> next round of map-tasks on the same node will save a lot of communication>>> cost.>>>>>> Thanks,>>> Yaron>>>>>>>>>>> -->> Harsh J>>>>

Thanks.As I see it, it cannot be done in the MapReduce 1 framework withoutchanging TaskTracker and JobTracker.Problem is I'm not familiar at all with YARN... it might be possible there.Thanks again!

On Mon, Aug 6, 2012 at 1:21 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> Ah, my bad - I skipped over the K-Means part of your original post.>> There currently isn't a way to do this with the existing MR framework and> APIs. A Reducer is initiated upon map completion and the Task JVM is canned> away after the Maps end. Perhaps you can use YARN to write something of> what you desire?>>> On Mon, Aug 6, 2012 at 12:11 AM, Yaron Gonen <[EMAIL PROTECTED]>wrote:>>> Thanks for the fast reply, but I don't see how a custom record reader>> will help.>> Consider again the k-means: the mappers need to stand-by until all the>> reducers finish to calculate the new clusters' center. Only then, after the>> reducers finish their work, the stand-by mappers get back to life and>> perform their work.>>>>>> On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <[EMAIL PROTECTED]> wrote:>>>>> Sure you can, as we provide pluggable code points via the API. Just>>> write a custom record reader that doubles the work (first round reads>>> actual input, second round reads your known output and reiterates). In the>>> mapper, separate the first and second logic via a flag.>>>>>>>>> On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <[EMAIL PROTECTED]>wrote:>>>>>>> Hi,>>>> Is there a way to keep a map-task alive after it has finished its work,>>>> to later perform another task on its same input?>>>> For example, consider the k-means clustering algorithm (k-means>>>> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop>>>> implementation<http://codingwiththomas.blogspot.co.il/2011/05/k-means-clustering-with-mapreduce.html>).>>>> The only thing changing between iterations is the clusters centers. All the>>>> input points remain the same. Keeping the mapper alive, and performing the>>>> next round of map-tasks on the same node will save a lot of communication>>>> cost.>>>>>>>> Thanks,>>>> Yaron>>>>>>>>>>>>>>>> -->>> Harsh J>>>>>>>>>> --> Harsh J>

NEW: Monitor These Apps!

Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext