The general practice is to install your deps into a custom locationsuch as /opt/john-jars, and extend YARN_CLASSPATH to include the jars,while also configuring the classes under the aux-services list. Youneed to take care of deploying jar versions to /opt/john-jars/contents across the cluster though.

I think it may be a neat idea to have jars be placed on HDFS or anyother DFS, and the yarn-site.xml indicating the location plus class toload. Similar to HBase co-processors. But I'll defer to Vinod on ifthis would be a good thing to do.

(I know the right next thing with such an ability people will ask foris hot-code-upgrades…)

On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <[EMAIL PROTECTED]> wrote:> Are there recommended conventions for adding additional code to a stock> Hadoop install?>> It would be nice if we could piggyback on whatever mechanisms are used to> distribute hadoop itself around the cluster.>> john>>>> From: Vinod Kumar Vavilapalli [mailto:[EMAIL PROTECTED]]> Sent: Thursday, August 22, 2013 6:25 PM>>> To: [EMAIL PROTECTED]> Subject: Re: yarn-site.xml and aux-services>>>>>> Auxiliary services are essentially administer-configured services. So, they> have to be set up at install time - before NM is started.>>>> +Vinod>>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley <[EMAIL PROTECTED]>> wrote:>> Following up on this, how exactly does one *install* the jar(s) for> auxiliary service? Can it be shipped out with the LocalResources of an AM?> MapReduce's aux-service is presumably installed with Hadoop and is just> sitting there in the right place, but if one wanted to make a whole new> aux-service that belonged with an AM, how would one do it?>> John>>> -----Original Message-----> From: John Lilley [mailto:[EMAIL PROTECTED]]> Sent: Wednesday, June 05, 2013 11:41 AM> To: [EMAIL PROTECTED]> Subject: RE: yarn-site.xml and aux-services>> Wow, thanks. Is this documented anywhere other than the code? I hate to> waste y'alls time on things that can be RTFMed.> John>>> -----Original Message-----> From: Harsh J [mailto:[EMAIL PROTECTED]]> Sent: Wednesday, June 05, 2013 9:35 AM> To: <[EMAIL PROTECTED]>> Subject: Re: yarn-site.xml and aux-services>> John,>> The format is ID and sub-config based:>> First, you define an ID as a service, like the string "foo". This is the ID> the applications may lookup in their container responses map we discussed> over another thread (around shuffle handler).>> <property>> <name>yarn.nodemanager.aux-services</name>> <value>foo</value>> </property>>> Then you define an actual implementation class for that ID "foo", like so:>> <property>> <name>yarn.nodemanager.aux-services.foo.class</name>> <value>com.mypack.MyAuxServiceClassForFoo</value>> </property>>> If you have multiple services foo and bar, then it would appear like the> below (comma separated IDs and individual configs):>> <property>> <name>yarn.nodemanager.aux-services</name>> <value>foo,bar</value>> </property>> <property>> <name>yarn.nodemanager.aux-services.foo.class</name>> <value>com.mypack.MyAuxServiceClassForFoo</value>> </property>> <property>> <name>yarn.nodemanager.aux-services.bar.class</name>> <value>com.mypack.MyAuxServiceClassForBar</value>> </property>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <[EMAIL PROTECTED]>> wrote:>> Good, I was hoping that would be the case. But what are the mechanics of>> it? Do I just add another entry? And what exactly is "madreduce.shuffle"?>> A scoped class name? Or a key string into some map elsewhere?>>>> e.g. like:>>>> <property>>> <name>yarn.nodemanager.aux-services</name>>> <value>mapreduce.shuffle</value>>> </property>>> <property>>> <name>yarn.nodemanager.aux-services</name>>> <value>myauxserviceclassname</value>>> </property>>>>> Concerning auxiliary services -- do they communicate with NodeManager via

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <[EMAIL PROTECTED]> wrote:> Harsh,>> Thanks as usual for your sage advice. I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.>> FWIW, I would vote to be able to load YARN services from HDFS. What is the appropriate forum to file a request like that?>> Thanks> John>> -----Original Message-----> From: Harsh J [mailto:[EMAIL PROTECTED]]> Sent: Wednesday, September 04, 2013 12:05 AM> To: <[EMAIL PROTECTED]>> Subject: Re: yarn-site.xml and aux-services>>> Thanks for the clarification. I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.>> We could class-load directly from HDFS, like HBase Co-Processors do.>>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:>> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.>> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <[EMAIL PROTECTED]> wrote:>> Harsh,>>>> Thanks for the clarification. I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.>>>> What about having the tasks themselves start the per-node service as a child process? I've been told that the NM kills the process group, but won't setgrp() circumvent that?>>>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks? Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:>> 1) AM spawns "mapper-like" tasks around the cluster>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).>> 4) AM spawns "reducer-like" tasks around the cluster.>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.>>>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.>>>> John>>>>>> -----Original Message----->> From: Harsh J [mailto:[EMAIL PROTECTED]]>> Sent: Friday, August 23, 2013 11:00 AM>> To: <[EMAIL PROTECTED]>>> Subject: Re: yarn-site.xml and aux-services>>>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.>>>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.>>>> (I know the right next thing with such an ability people will ask for>> is hot-code-upgrades...)>>>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <[EMAIL PROTECTED]> wrote:>>> Are there recommended conventions for adding additional code to a

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <[EMAIL PROTECTED]> wrote:> Harsh,>> Thanks as usual for your sage advice. I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.>> FWIW, I would vote to be able to load YARN services from HDFS. What is the appropriate forum to file a request like that?>> Thanks> John>> -----Original Message-----> From: Harsh J [mailto:[EMAIL PROTECTED]]> Sent: Wednesday, September 04, 2013 12:05 AM> To: <[EMAIL PROTECTED]>> Subject: Re: yarn-site.xml and aux-services>>> Thanks for the clarification. I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.>> We could class-load directly from HDFS, like HBase Co-Processors do.>>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:>> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.>> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <[EMAIL PROTECTED]> wrote:>> Harsh,>>>> Thanks for the clarification. I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.>>>> What about having the tasks themselves start the per-node service as a child process? I've been told that the NM kills the process group, but won't setgrp() circumvent that?>>>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks? Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:>> 1) AM spawns "mapper-like" tasks around the cluster>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).>> 4) AM spawns "reducer-like" tasks around the cluster.>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.>>>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.>>>> John>>>>>> -----Original Message----->> From: Harsh J [mailto:[EMAIL PROTECTED]]>> Sent: Friday, August 23, 2013 11:00 AM>> To: <[EMAIL PROTECTED]>>> Subject: Re: yarn-site.xml and aux-services>>>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.>>>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.

Harsh J

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext