Question

I am using DistributedCache. But there are no files in the cache after execution of code. I have referred to other similar questions but the answers does not solve my issue.

Please find the code below:

   Configuration conf = new Configuration();
   Job job1 = new Job(conf, "distributed cache");
   Configuration conf1 = job1.getConfiguration();
   DistributedCache.addCacheFile(new Path("File").toUri(), conf1);
   System.out.println("distributed cache file "+DistributedCache.getLocalCacheFiles(conf1));

This gives null..

The same thing when given inside mapper also gives null hence. Please let me know your suggestions.

Thanks

Was it helpful?

Solution

try getCacheFiles() instead of getLocalCacheFiles()

OTHER TIPS

I believe this is (at least partly) due to what Chris White wrote here:

After you create your Job object, you need to pull back the Configuration object as Job makes a copy of it, and configuring values in conf2 after you create the job will have no effect on the job iteself. Try this:

job = new Job(new Configuration());
Configuration conf2 = job.getConfiguration();
job.setJobName("Join with Cache");
DistributedCache.addCacheFile(new URI("hdfs://server:port/FilePath/part-r-00000"), conf2);

I guess if it still does not work, there is another problem somewhere, but that doesn't mean that Chris White's point is not correct.

When distributing, don't forget the local link name, preferably using a relative path:

URI is of the form hdfs://host:port/absolute-path#local-link-name

When reading:

  • if you don't use distributed cache possibilities, you are supposed to use HDFS's FileSystem to access the hdfs://host:port/absolute-path
  • if you use the distributed cache, then you have to use standard Java file utilities to access the local-link-name

The cache file needs to be in the Hadoop FileSystem. You can do this: void copyFileToHDFS(JobConf jobConf, String from, String to){

    try {
        FileSystem aFS = FileSystem.get(jobConf);
        aFS.copyFromLocalFile(false, true, new Path(
                from), new Path(to));
    } catch (IOException e) {
        throw new RuntimeException(e);
    } 
}

Once the files are copied you can add them to the cache, like so:

    void fillCache(JobConf jobConf){
        Job job;
        copyFileToHDFS(jobConf, fromLocation, toLocation);
        job = Job.getInstance(jobConf);
        job.addCacheFile(new URI(toLocation));
        JobConf newJobConf = new JobConf(job.getConfiguration());
    }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top