
I'm trying to use the CombineFileInputFormat class using Yelp's MrJob tool for EMR. The jobflow is created using hadoop streaming, and MrJob's documentation indicates the CombineFileInputFormat class must be bundled in a customized hadoop-streaming.jar.

For context, please follow this question.

Specifically my question is: Where should the concrete class CombinedInputFormat.class be bundled or referenced within the hadoop-streaming.jar?

I have tried bundling the CombinedInputFormat.class by adding it to a directory org/apache/hadoop/streaming and executing:

jar uvf my-hadoop-streaming.jar org/apache/hadoop/streaming

If I do that, the streaming jobflow starts, with the option -inputformat CombinedInputFormat the Job starts the first step and breaks, with error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/streaming/CombinedInputFormat (wrong name: CombinedInputFormat)
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClassCond(
        at java.lang.ClassLoader.defineClass(

If I just try to set it in the root path with:

jar uvf my-hadoop-streaming.jar CombinedInputFormat.class

The error I get is:

-inputformat : class not found : CombinedInputFormat
Streaming Job Failed!

How should I bundle the CombinedInputFormat.class so that it will be correctly taken and solve the NoClassDefFoundError error?

È stato utile?


The class CombinedInputFormat explained here extends CombineFileInputFormat and isn't ported with hadoop. So what you need to do is, in the same package where you have you mapper/reducer job class, you have to CREATE a class and have the code stated in the earlier issue. Then create jar and it should run normally.

So basically, you need to write your own implementation of CombineFileInputFormat(which I did it for you) and you can name it anything you want, say ABCClass instead of CombinedInputFormat as I had named it.

Altri suggerimenti

This is another easy way i found to get custom jar built and run in hadoop local or EMR

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top