Question

It maybe a basic question but in the map reduce program, I would like to read the names of all the files present in the inputfolder rather than the contents and i would like to send the names of those files to my mapper class. Configuration conf=new Configuration();

    Job job=new Job(conf,"Analysis");
    job.setInputFormatClass(KeyValueTextInputFormat.class);
    //Path pa =new Path("hdfs://localhost:54310/home/aparajith");
    //pa.

    FileInputFormat.addInputPath(job,new Path("/hduser/"));
    FileOutputFormat.setOutputPath(job, new Path("/CrawlerOutput23/"));

    job.setJarByClass(mapper.Mapper1.class);

    job.setMapperClass(mapper.Mapper1.class);
    job.setReducerClass(mapper.Reducer1.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    System.exit(job.waitForCompletion(true) ? 0 : -1);

This is my main class and I cant seem to figure it out.

Was it helpful?

Solution 2

Easiest solution is to put all the names of the file in that directory in a file and give that file as input file to the job

OTHER TIPS

If you want the names of the files keys and values are coming from in your mapper:

In your mapper you could simply ignore the key and value that are passed in (by default the position in file as a LongWritable key and the line content as Text value) and do something like the following:

@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
    // insert remaining mapper logic here
}

This gets the file name from which the current key and value in the mapper were read.


If you just want the file names in your directory as input to your mapper:

You could iterate over the files in your input directory (yourInputDirPath) and write a new file containing their filenames (inputDirFilenamesPath) like so:

    FSDataOutputStream stream;
    try {
        stream = fs.create(inputDirFilenamesPath);
        RemoteIterator<LocatedFileStatus> it = fs.listFiles(yourInputDirPath, false);
        while (it.hasNext()) {
            stream.write(it.next().getPath().toString().getBytes());
            stream.write('\n');
        }
    } finally {
        stream.close();
    }

Then you can simply use FileInputFormat.addInputPath(job, inputDirFilenamesPath); to add this file to your input to the MR job.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top