質問

Giraphには比較的新しい、私は私のGiraph edit-compile-deployループを私たちのコードのために取り入れようとしています。私は http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop / 。 SimpleShortestPathSvertex Giraphの例の変更されたバージョンを実行するときのClassNotFoundException。私は-libjarsとhadoop_classpathのさまざまな組み合わせを試しましたが、私はアイデアの外にいて、私は本当にあなたの助けに感謝します。詳細フォロー。

バージョン

  • Hadoop:Hadoop 2.0.0-CDH4.4.0
  • Giraph:Giraph-examples-1.0.0-for-Hadoop-2.0.0-alpha-Jar-rependencies.jar

PagerankbenchmarkはOK

を実行します。
$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.benchmark.PageRankBenchmark \
-Dgiraph.zkList=<myhost>:2181 \
-e 1 -s 3 -v -V 50 -w 1

...
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
...
(full output is below)
.

Giraphrunner SimpleShortestPathServertexもOK

を実行します。
$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
org.apache.giraph.examples.SimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op goutput/shortestpathsC2 \
-ca SimpleShortestPathsVertex.source=2 \
-w 1

...
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
...
(full output is below)
.

ボーナス:結果は正しいです:

$ hadoop fs -cat goutput/shortestpathsC2/p*
0   1.0
2   2.0
1   0.0
3   1.0
4   5.0
.

SimpleShortStathSvertexの私の修正版はClassNotFoundException

を取得します。

変更された頂点(kdlsimpleshortestpathsvertex、パッケージなし)を含むjar:

$ jar -tf ~/kdl_hadoop_play.jar
META-INF/MANIFEST.MF
KdlSimpleShortestPathsVertex.class
META-INF/
.

しかし私のラン・パークス:

$ hadoop jar $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
-libjars ~/kdl_hadoop_play.jar \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/cornell/goutput/shortestpathsC2 \
-ca KdlSimpleShortestPathsVertex.source=2 \
-w 1

Exception in thread "main" java.lang.ClassNotFoundException: KdlSimpleShortestPathsVertex
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:210)
at org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
.

私の最善の推測...

...周りを見回した後、Giraphrunnerは、 http://grepalex.com/2013/02/25/hadoop-libjars/ <あなたのコードがGenericOptionsparserを使用していることを確認してください)。 GIRAPHソースを閲覧すると、そのクラスがアクセスされたことはわかりません。私はJADOOP_CLASSPATHを私のJARに設定しましたが、それは問題を解決しませんでした。

任意の助けが素晴らしいでしょう!

Pagerankbenchマーク出力

>
14/08/01 11:42:27 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:42:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:42:28 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything
14/08/01 11:42:29 INFO mapred.JobClient: Running job: job_201407291058_0015
14/08/01 11:42:30 INFO mapred.JobClient:  map 0% reduce 0%
14/08/01 11:42:40 INFO mapred.JobClient:  map 50% reduce 0%
14/08/01 11:42:41 INFO mapred.JobClient:  map 100% reduce 0%
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
14/08/01 11:42:44 INFO mapred.JobClient: Counters: 39
14/08/01 11:42:44 INFO mapred.JobClient:   File System Counters
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of bytes read=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of bytes written=369846
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of write operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of bytes read=88
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of bytes written=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of read operations=2
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of write operations=1
14/08/01 11:42:44 INFO mapred.JobClient:   Job Counters 
14/08/01 11:42:44 INFO mapred.JobClient:     Launched map tasks=2
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=15772
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:   Map-Reduce Framework
14/08/01 11:42:44 INFO mapred.JobClient:     Map input records=2
14/08/01 11:42:44 INFO mapred.JobClient:     Map output records=0
14/08/01 11:42:44 INFO mapred.JobClient:     Input split bytes=88
14/08/01 11:42:44 INFO mapred.JobClient:     Spilled Records=0
14/08/01 11:42:44 INFO mapred.JobClient:     CPU time spent (ms)=2230
14/08/01 11:42:44 INFO mapred.JobClient:     Physical memory (bytes) snapshot=411357184
14/08/01 11:42:44 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2428895232
14/08/01 11:42:44 INFO mapred.JobClient:     Total committed heap usage (bytes)=806027264
14/08/01 11:42:44 INFO mapred.JobClient:   Giraph Stats
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate edges=50
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate finished vertices=50
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate vertices=50
14/08/01 11:42:44 INFO mapred.JobClient:     Current master task partition=0
14/08/01 11:42:44 INFO mapred.JobClient:     Current workers=1
14/08/01 11:42:44 INFO mapred.JobClient:     Last checkpointed superstep=0
14/08/01 11:42:44 INFO mapred.JobClient:     Sent messages=0
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep=4
14/08/01 11:42:44 INFO mapred.JobClient:   Giraph Timers
14/08/01 11:42:44 INFO mapred.JobClient:     Input superstep (milliseconds)=238
14/08/01 11:42:44 INFO mapred.JobClient:     Setup (milliseconds)=2903
14/08/01 11:42:44 INFO mapred.JobClient:     Shutdown (milliseconds)=68
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 0 (milliseconds)=77
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 1 (milliseconds)=64
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 2 (milliseconds)=45
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 3 (milliseconds)=43
14/08/01 11:42:44 INFO mapred.JobClient:     Total (milliseconds)=3442
.

SimpleShortSpathSvertex出力

14/08/01 11:47:37 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
14/08/01 11:47:37 INFO utils.ConfigurationUtils: Setting custom argument [SimpleShortestPathsVertex.source] to [2] in GiraphConfiguration
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format edge value type is not known
14/08/01 11:47:37 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:47:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:47:38 INFO mapred.JobClient: Running job: job_201407291058_0017
14/08/01 11:47:39 INFO mapred.JobClient:  map 0% reduce 0%
14/08/01 11:47:44 INFO mapred.JobClient:  map 50% reduce 0%
14/08/01 11:47:45 INFO mapred.JobClient:  map 100% reduce 0%
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
14/08/01 11:47:46 INFO mapred.JobClient: Counters: 39
14/08/01 11:47:46 INFO mapred.JobClient:   File System Counters
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of bytes read=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of bytes written=367068
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of write operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of bytes read=200
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of bytes written=30
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of read operations=5
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of write operations=2
14/08/01 11:47:46 INFO mapred.JobClient:   Job Counters 
14/08/01 11:47:46 INFO mapred.JobClient:     Launched map tasks=2
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=8538
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:   Map-Reduce Framework
14/08/01 11:47:46 INFO mapred.JobClient:     Map input records=2
14/08/01 11:47:46 INFO mapred.JobClient:     Map output records=0
14/08/01 11:47:46 INFO mapred.JobClient:     Input split bytes=88
14/08/01 11:47:46 INFO mapred.JobClient:     Spilled Records=0
14/08/01 11:47:46 INFO mapred.JobClient:     CPU time spent (ms)=1590
14/08/01 11:47:46 INFO mapred.JobClient:     Physical memory (bytes) snapshot=341344256
14/08/01 11:47:46 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2363527168
14/08/01 11:47:46 INFO mapred.JobClient:     Total committed heap usage (bytes)=504758272
14/08/01 11:47:46 INFO mapred.JobClient:   Giraph Stats
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate edges=12
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate finished vertices=5
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate vertices=5
14/08/01 11:47:46 INFO mapred.JobClient:     Current master task partition=0
14/08/01 11:47:46 INFO mapred.JobClient:     Current workers=1
14/08/01 11:47:46 INFO mapred.JobClient:     Last checkpointed superstep=0
14/08/01 11:47:46 INFO mapred.JobClient:     Sent messages=0
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep=4
14/08/01 11:47:46 INFO mapred.JobClient:   Giraph Timers
14/08/01 11:47:46 INFO mapred.JobClient:     Input superstep (milliseconds)=181
14/08/01 11:47:46 INFO mapred.JobClient:     Setup (milliseconds)=313
14/08/01 11:47:46 INFO mapred.JobClient:     Shutdown (milliseconds)=128
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 0 (milliseconds)=57
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 1 (milliseconds)=54
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 2 (milliseconds)=36
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 3 (milliseconds)=35
14/08/01 11:47:46 INFO mapred.JobClient:     Total (milliseconds)=805
.
役に立ちましたか?

解決

OK、HadoopスクリプトをHadoopとGiraphの情報源と一緒に見ると、私がそれを考え出したと思います。大きなヒントは hadoop を使用してから、この行と共に出力:

warn mapred.jobclient:解析するためにGenericOptionSparserを使用 引数アプリケーションは同じ用のツールを実装する必要があります。

原因は、Giraphrunnerがorcemended org.apache.hadoop.util.genericOptionsparser.getCommandline()を使用する代わりに、org.apache.commons.cli.commandlineを取得するために、org.apache.commons.cli.commandlineを取得することが原因です。 'libjars'オプションを尊重します。これにより、Hadoopの汎用クラスパス処理ツール:ClassPathおよび/またはhadoop_classpathに戻ったようになりました。これがうまくいったものです:

  • colon 区切り文字を使用して、アプリケーションJAR を含めるようにhadoop_classpathを設定します。
  • その同じクラスパスを使用して - カンマ区切り文字を使用して-libjarsを渡します。

例えば、マシンの場合:

$ export GIRAPH_HOME=/share/apps/giraph
$ export HADOOP_CLASSPATH=/home/<me>/kdl_hadoop_play.jar:$GIRAPH_HOME/giraph-ex.jar:$HADOOP_CLASSPATH
$ export LIBJARS=/home/<me>/kdl_hadoop_play.jar,$GIRAPH_HOME/giraph-core.jar
$ hadoop fs -rm -R goutput/shortestpathsC2
$ hadoop jar $GIRAPH_HOME/giraph-ex.jar org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
-libjars ${LIBJARS} \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/cornell/goutput/shortestpathsC2 \
-ca SimpleShortestPathsVertex.source=2 \
-w 1
...
$ hadoop fs -cat goutput/shortestpathsC2/p*
.

これは予想される出力と結果を与えます。

より一般的には、Giraphチームが(明らかに)より標準的なパーサを使用するようにコードを変更した場合、それは役に立ちます。

助けを願っています!

他のヒント

これが機能していない理由がわかりませんが、これを修正するための迅速な汚い方法があります。Codeをgiraph-examples/src/main/java/org/apache/giraph/examples/ディレクトリに入れてみてください(SimpleShortestPathが配置されている場所)。それからGirapStageSetagCodeを実行してGiraph-examples jarを構築します。その後、SimpleShortEstPathの場合はSimpleShortestPathをファイル名で置き換えるようにしてください。 私はそれが助けを願っています。

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top