It seems to me as if you are not using the correct classes for the Output.
From one of the MapReduce Tutorials:
The key and value classes have to be serializable by the framework and hence need to implement the Writable interface. Additionally, the key classes have to implement the WritableComparable interface to facilitate sorting by the framework.
Therefore you should replace String.class
with Text.class
and Integer.class
with IntWritable.class
.
I hope that fixes your problem.
Why can't I use the basic String or Integer classes?
Integer and String implement the standard Serializable-interface of Java as seen in the docs. The problem is that MapReduce serializes/deserializes values not utilizing this standard interface but rather an own interface, which is called Writable.
So why don't they just use the basic Java Interface?
Short answer: Because it is more efficient. The Writable Interface omits the type definition when serializing, because you already define the types of the input/output in your MapReduce-code. As your code already knows what's coming, instead of serializing a String like this:
String: "theStringItself"
It could be serialized like:
theStringItself
As you can see this saves an enormous amount of memory.
Long answer: Read this awesome blog post.