Question

I'm trying to run a mapreduce code example on AWS. This is the link for the code sample https://github.com/ScaleUnlimited/wikipedia-ngrams

However, I'm pretty new for these things. In fact, they did write in the Readme file that I should build a job jar file from the code sample. But, still didn't get how could I build a job jar.

I'm following also these videos that explain how to run a job in EMR http://www.youtube.com/watch?v=cAZur5maWZE&list=PL080E1DEBCE5388F3

But they didn't tell also how to get this important jar file to start the work.

Any help

Was it helpful?

Solution 2

You can create the java files in eclipse, add hadoop to build path, then export it as a jar. See "6.1 Creating the Jar file" in this tutorial for details: Introduction to Amazon Web Services and MapReduce Jobs

And there are two ways to launch the job flow, through console or CLI, check the 6.2, 6.3 in the tutorial above.

OTHER TIPS

The same as for normal java program (http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html):

$ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d wordcount_classes WordCount.java 
$ jar -cvf /usr/joe/wordcount.jar -C wordcount_classes/ .

or if it is a maven project:

$ mvn clean package

or specific for https://github.com/ScaleUnlimited/wikipedia-ngrams (see README):

$ ant clean job
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top