Add to WholeFileInputFormat an implementation of getSplits
that returns as many duplicates as you want.
Hadoop reading as a whole file and send to many mappers
Question
I am writing a hadoop app where I want to read the input file as a whole and send it to manny mappers and let each mappers do part of the job. Here is my FileInputFormat. I have to make isSplitable
return false so that I can read the whole file. However, this leads to that only one mapper
will be initialized. Is there anyone who can tell me how to read the input file as a whole and send it to more than one mappers to process?
public class WholeFileInputFormat extends FileInputFormat<PairWritable, BytesWritable> {
@Override
protected boolean isSplitable(FileSystem fs, Path filename) {
return false;
}
@Override
public RecordReader<PairWritable, BytesWritable> getRecordReader(
InputSplit split, JobConf job, Reporter reporter) throws IOException {
return new WholeFileRecordReader((FileSplit) split, job);
}
}
La solution
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow