Reading HAR file from DistributedCache in mapreduce

Question

protected InputStream getInputStreamToDistCacheFile() throws IOException{
        InputStream inputStream;
        String cachedDatafileName = System.getProperty(DIST_CACHE_FILE_NAME);
        LOG.info(String.format("Looking for[%s]=[%s] in DistributedCache",DIST_CACHE_FILE_NAME, cachedDatafileName));

        URI[] uris = DistributedCache.getCacheArchives(getContext().getConfiguration());
        URI uriToCachedDatafile = null;
        for(URI uri : uris){
            if(uri.toString().endsWith(cachedDatafileName)){
                uriToCachedDatafile = uri;
                break;
            }
        }
        if(uriToCachedDatafile == null){
            throw new RuntimeConfigurationException(String.format("Looking for[%s]=[%s] in DistributedCache failed. There is no such file",
                    DIST_CACHE_FILE_NAME, cachedDatafileName));
        }

        //Path pathToFile = new Path(uriToCachedDatafile +"/stf/db_bts_stf.txt");
        Path pathToFile = new Path("har:///"+"home/ssa/devel/megalabs/kyc-solution/kyc-mrjob/target/test-classes/GSMCellSubscriberHomeIntersectionJobDescriptionClusterMRTest/in/gsm_cell_location_stf.har" +"/stf/db_bts_stf.txt");
        //Path pathToFile = new Path(("har://home/ssa/devel/megalabs/kyc-solution/kyc-mrjob/target/test-classes/GSMCellSubscriberHomeIntersectionJobDescriptionClusterMRTest/in/gsm_cell_location_stf.har"));

        LOG.info(String.format("[%s] has been found. Uri is: [%s]. The path is:[%s]",cachedDatafileName, uriToCachedDatafile, pathToFile));
        FileSystem harFileSystem = pathToFile.getFileSystem(context.getConfiguration());
        FSDataInputStream fin = harFileSystem.open(pathToFile);
        LOG.info("fin: " + fin);
//        FileSystem fileSystem =  pathToFile.getFileSystem(getContext().getConfiguration());
//        HarFileSystem harFileSystem = new HarFileSystem(fileSystem);
//        harFileSystem.exists(new Path("har://home/ssa/devel/mycompany/my-solution/my-mrjob/target/test-classes/HomeJobDescriptionClusterMRTest/in/locations.har"));
//        LOG.info("harFileSystem.exists(pathToFile):"+ harFileSystem.exists(pathToFile));
//        harFileSystem.initialize(uriToCachedDatafile, context.getConfiguration());



        FileStatus[] statuses = harFileSystem.listStatus(new Path("har:///"+"har://home/ssa/devel/mycompany/my-solution/my-mrjob/target/test-classes/HomeJobDescriptionClusterMRTest/in/locations.har"));
        for(FileStatus fileStatus : statuses){
            LOG.info("fileStatus isDir"+fileStatus.isDirectory() +" len:" + fileStatus.getLen());
        }

//        String tmpPathToFile = "har:///"+pathToFile.toString(); //+"/stf/db_bts_stf.txt";
//        Path tmpPath = new Path(tmpPathToFile);
//        LOG.info("KILL ME PATH TO FILE IN ARCHIVE: " +tmpPath);
//        inputStream = harFileSystem.open(tmpPath);
//        return inputStream;
        return fin;
    }

As you can see, it's terrible. You have manually read index file stored inside archive and reconstruct paths using index file metadata. If you know the exact name of a file stored in archive (like in my example), you can construct paths manually.

It's not convinient.I did expect something like Zip->zipEntry, when you can iterate over entries of archive without knowing it's structure.