Getting the exception in WordCount Program in Hadoop

https://stackoverflow.com/questions/16003237

03-04-2022
|

Question

I am facing this exception when trying to run the first program on hadoop. (I am using hadoop new API on version 0.20.2). I searched on web, it looks like most of the people faced this problem when they did not set MapperClass and ReducerClass in the configuration logic. But I checked and it looks the code is ok . I will really appreciate if someone can help me out.

java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:871)

package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable> {

public void Map(LongWritable key,Text value,Context ctx) throws IOException , InterruptedException {
    String line = value.toString();
    for(String word:line.split("\\W+")) {
        if(word.length()> 0){
            ctx.write(new Text(word), new IntWritable(1));
        }
    }
}
}


package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context ctx) throws IOException,InterruptedException {
 int wordCount = 0;
    for(IntWritable value:values)
    {
        wordCount+=value.get();
    }
    ctx.write(key,new IntWritable(wordCount));
}

}


package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountJob {
public static void main(String args[]) throws IOException, InterruptedException, ClassNotFoundException{
    if(args.length!=2){
        System.out.println("invalid usage");
        System.exit(-1);
    }

    Job job = new Job();
    job.setJarByClass(WordCountJob.class);
    job.setJobName("WordCountJob");



    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setMapperClass(WordCountMapper.class);
    job.setReducerClass(WordCountReducer.class);

    //job.setCombinerClass(WordCountReducer.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);


    System.exit(job.waitForCompletion(true) ? 0:1);

}
}

Solution

Your Map() method is not able to override Mapper's map() method due to your use of a capital M in place of a lower case m.

As such, the default identity map method is being used, which results in the same key and value pair used as input also being used as output. Due to your mapper having specified extends Mapper<LongWritable,Text,Text,IntWritable>, your attempted output of LongWritable, Text instead of Text, IntWritable is causing the exception.

Changing your Map() method to map() and adding the @Override annotation should do the trick - if you're using an IDE I'd highly suggest using it's built in method overriding functionality to avoid errors like this.

OTHER TIPS

Just Edit your mapper function from

public void Map(LongWritable key, Text value, Context ctx)

public void map(LongWritable key, Text value, Context ctx)

It is working for me.

Hadoop Version :- Hadoop 1.0.3

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow