I need to implement below Functionality using Hadoop Map-Reduce?

1) I am reading one input for a mapper from one source & another input from another different input source.

2) I need to pass both output of mapper into a single reducer for further process.

Is there any to do the above requirement in Hadoop Map-Reduce

有帮助吗?

解决方案

MultipleInputs.addInputPath is what you are looking for. This is how your configuration would look like. Make sure both AnyMapper1 and AnyMapper2 write the same output expected by MergeReducer

JobConf conf = new JobConf(Merge.class);
conf.setJobName("merge");

conf.setOutputKeyClass(IntWritable.class); 
conf.setOutputValueClass(Text.class); 
conf.setReducerClass(MergeReducer.class);
conf.setOutputFormat(TextOutputFormat.class);

MultipleInputs.addInputPath(conf, inputDir1, SequenceFileInputFormat.class, AnyMapper1.class);
MultipleInputs.addInputPath(conf, inputDir2, TextInputFormat.class, AnyMapper2.class);

FileOutputFormat.setOutputPath(conf, outputPath);

其他提示

You can create a custom writable. You can populate the same in the Mapper. Later in the Reducer you can get the Custom writable Object and do the necessary business operation.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top