// In the driverJobConf conf =newJobConf(getConf(),WordCount.class);...DistributedCache.addCacheFile(newPath(filename).toUri(), conf);// In the mapperPath[] myCacheFiles =DistributedCache.getLocalCacheFiles(job);...

Your comment on this answer:

Email me at this address if a comment is added after mine:Email me if a comment is added after mine

Privacy: Your email address will only be used for sending these notifications.

+1 vote

The preferred way of using DistributedCache for YARN/MapReduce 2 is as follows:

In your driver, use the Job.addCacheFile()

publicint run(String[] args)throwsException{Configuration conf = getConf();Job job =Job.getInstance(conf,"MyJob");
job.setMapperClass(MyMapper.class);// ...// Mind the # sign after the absolute file location.// You will be using the name after the # sign as your// file name in your Mapper/Reducer
job.addCacheFile(new URI("/user/yourname/cache/some_file.json#some"));
job.addCacheFile(new URI("/user/yourname/cache/other_file.json#other"));return job.waitForCompletion(true)?0:1;}

And in your Mapper/Reducer, override the setup(Context context) method:

@Overrideprotectedvoid setup(Mapper<LongWritable,Text,Text,Text>.Context context)throwsIOException,InterruptedException{if(context.getCacheFiles()!=null&& context.getCacheFiles().length >0){File some_file =newFile("./some");File other_file =newFile("./other");// Do things to these two files, like read them// or parse as JSON or whatever.}super.setup(context);}