how to build a job jar for hadoop Mapreduce job in AWS

https://stackoverflow.com/questions/22687546

22-06-2023
|

Question

I'm trying to run a mapreduce code example on AWS. This is the link for the code sample https://github.com/ScaleUnlimited/wikipedia-ngrams

However, I'm pretty new for these things. In fact, they did write in the Readme file that I should build a job jar file from the code sample. But, still didn't get how could I build a job jar.

I'm following also these videos that explain how to run a job in EMR http://www.youtube.com/watch?v=cAZur5maWZE&list=PL080E1DEBCE5388F3

But they didn't tell also how to get this important jar file to start the work.

Any help

Solution 2

You can create the java files in eclipse, add hadoop to build path, then export it as a jar. See "6.1 Creating the Jar file" in this tutorial for details: Introduction to Amazon Web Services and MapReduce Jobs

And there are two ways to launch the job flow, through console or CLI, check the 6.2, 6.3 in the tutorial above.

OTHER TIPS

The same as for normal java program (http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html):

$ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d wordcount_classes WordCount.java 
$ jar -cvf /usr/joe/wordcount.jar -C wordcount_classes/ .

or if it is a maven project:

$ mvn clean package

or specific for https://github.com/ScaleUnlimited/wikipedia-ngrams (see README):

$ ant clean job

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow