I am writing a hadoop app where I want to read the input file as a whole and send it to manny mappers and let each mappers do part of the job. Here is my FileInputFormat. I have to make isSplitablereturn false so that I can read the whole file. However, this leads to that only one mapper will be initialized. Is there anyone who can tell me how to read the input file as a whole and send it to more than one mappers to process?

public class WholeFileInputFormat extends FileInputFormat<PairWritable, BytesWritable> {
    @Override
    protected boolean isSplitable(FileSystem fs, Path filename) {
        return false;
    }

    @Override
    public RecordReader<PairWritable, BytesWritable> getRecordReader(
      InputSplit split, JobConf job, Reporter reporter) throws IOException {
        return new WholeFileRecordReader((FileSplit) split, job);
    }
}
有帮助吗?

解决方案

Add to WholeFileInputFormat an implementation of getSplits that returns as many duplicates as you want.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top