Question

I am trying to stream a sequence file generated by one of the Mahout examples to see its contents:

    hadoop jar hadoop-streaming-0.20.2-cdh3u0.jar \
    -input /tmp/mahout-work-me/20news-bydate/bayes-test-input-output/ \
    -output /tmp/me/mm \
    -mapper "cat" \
    -reducer "wc -l" \
    -inputformat SequenceFileAsTextInputFormat

The job starts successfully and eventually dies with:

11/11/30 21:08:39 INFO streaming.StreamJob:  map 0%  reduce 0%
11/11/30 21:09:17 INFO streaming.StreamJob:  map 100%  reduce 100%
java.lang.RuntimeException: java.io.IOException: WritableName can't load class: org.apache.mahout.common.StringTuple

I wonder if something is wrong with my streaming jar file, if I I need to point explicitly to the Mahout jar that has this class (tried setting HADOOP_CLASSPATH to the location of mahout-core-0.5-cdh3u2.jar but did not work), or maybe even something else?

Any help is appreciated. Thanks.

Was it helpful?

Solution

Add this option:

-libjars mahout-core-0.5-cdh3u2.jar
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top