Question

I am working on my project to integrate apache avro into my MapR program. However, I am very confused by the usage of new mapreduce packages compared to mapred. The latter takes a detailed instruction on how to use in different situations and less information is given for the new. But what I knew is that they correspond to new and old interfaces of hadoop.

Does anyone has any experience or examples using mapreduce interfaces for jobs whose input is non-Avro data (such as TextInputFormat) file and output is avro file.

Was it helpful?

Solution

The two packages represent input / output formats, mapper and reducer base classes for the corresponding Hadoop mapred and mapreduce APIs.

So if your job uses the old (mapred) package APIs, then you should use the corresponding mapred avro package classes.

Avro has an example word count adaptation that uses Avro output format, which should be easy to modify for the newer mapreduce API:

http://svn.apache.org/viewvc/avro/trunk/doc/examples/mr-example/src/main/java/example/AvroWordCount.java?view=markup

Here's some gist with the modifications: https://gist.github.com/chriswhite199/6755242

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top