Question

I am having a strange problem with a Hadoop Map/Reduce job. The job submits correctly, runs, but produces incorrect/strange results. It seems as if the mapper and reducer are not run at all. The input file is transformed from:

12
16
132
654
132
12

to

0   12
4   16
8   132
13  654
18  132
23  12

I assume the first column are the generated keys for pairs before the mapper, but neither mapper nor reducer seem to run. The job ran fine when I used the old API.

Source for the job is provided below. I am using Hortonworks as the platform.

public class HadoopAnalyzer
{
    public static class Map extends Mapper<LongWritable, Text, Text, IntWritable>
    {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
        {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens())
            {
                word.set(tokenizer.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>
    {
        @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
        {
            int sum = 0;
            for (IntWritable val : values)
            {
                sum += val.get();
            }
            context.write(key, new IntWritable(sum));
        }
    }

    public static void main(String[] args) throws Exception
    {
        JobConf conf = new JobConf(HadoopAnalyzer.class);
        conf.setJobName("wordcount");
        conf.set("mapred.job.tracker", "192.168.229.128:50300");
        conf.set("fs.default.name", "hdfs://192.168.229.128:8020");
        conf.set("fs.defaultFS", "hdfs://192.168.229.128:8020");
        conf.set("hbase.master", "192.168.229.128:60000");
        conf.set("hbase.zookeeper.quorum", "192.168.229.128");
        conf.set("hbase.zookeeper.property.clientPort", "2181");
        System.out.println("Executing job.");
        Job job = new Job(conf, "job");
        job.setInputFormatClass(InputFormat.class);
        job.setOutputFormatClass(OutputFormat.class);
        job.setJarByClass(HadoopAnalyzer.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        TextInputFormat.addInputPath(job, new Path("/user/usr/in"));
        TextOutputFormat.setOutputPath(job, new Path("/user/usr/out"));
        job.setMapperClass(Mapper.class);
        job.setReducerClass(Reducer.class);
        job.waitForCompletion(true);
        System.out.println("Done.");
    }
}

Maybe I am missing something obvious, but can anyone shed some light on what might be going wrong here?

Was it helpful?

Solution

The output is as expected because you used the following,

job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);

Which should have been --

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

You extended the Mapper and Reducer classes with Map and Reduce but didn't use them in your job.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top