Frage

Ich bin relativ neu bei Giraph und versuche, meine Giraph-Edit-Compile-Deploy-Schleife für unseren Code zum Laufen zu bringen.Ich bin in der Lage, verschiedene Beispiele zu nennen, die mich inspirieren http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/ , aber ich bleibe bei einer ClassNotFoundException hängen, wenn ich meine modifizierte Version des SimpleShortestPathsVertex Giraph-Beispiels ausführe.Ich habe verschiedene Kombinationen von -libjars und HADOOP_CLASSPATH ausprobiert, aber mir fehlen die Ideen und ich würde mich sehr über Ihre Hilfe freuen.Details folgen.

Versionen

  • Hadoop:Hadoop 2.0.0-cdh4.4.0
  • Giraphe:giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar

Der PageRankBenchmark läuft einwandfrei

$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.benchmark.PageRankBenchmark \
-Dgiraph.zkList=<myhost>:2181 \
-e 1 -s 3 -v -V 50 -w 1

...
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
...
(full output is below)

Der GiraphRunner SimpleShortestPathsVertex läuft ebenfalls einwandfrei

$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
org.apache.giraph.examples.SimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op goutput/shortestpathsC2 \
-ca SimpleShortestPathsVertex.source=2 \
-w 1

...
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
...
(full output is below)

Bonus:Die Ergebnisse sind korrekt:

$ hadoop fs -cat goutput/shortestpathsC2/p*
0   1.0
2   2.0
1   0.0
3   1.0
4   5.0

Aber meine modifizierte Version von SimpleShortestPathsVertex bekommt ClassNotFoundException

Das JAR mit dem geänderten Scheitelpunkt (KdlSimpleShortestPathsVertex, kein Paket) ist in Ordnung:

$ jar -tf ~/kdl_hadoop_play.jar
META-INF/MANIFEST.MF
KdlSimpleShortestPathsVertex.class
META-INF/

Aber mein Lauf kotzt:

$ hadoop jar $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
-libjars ~/kdl_hadoop_play.jar \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/cornell/goutput/shortestpathsC2 \
-ca KdlSimpleShortestPathsVertex.source=2 \
-w 1

Exception in thread "main" java.lang.ClassNotFoundException: KdlSimpleShortestPathsVertex
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:210)
at org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Meine beste Vermutung ...

... nachdem ich mich umgesehen habe, ist, dass GiraphRunner die -libjars möglicherweise nicht korrekt verarbeitet, wie von angedeutet http://grepalex.com/2013/02/25/hadoop-libjars/ („Stellen Sie sicher, dass Ihr Code GenericOptionsParser verwendet“).Beim Durchsuchen der Giraph-Quelle sehe ich nicht, dass auf die Klasse zugegriffen wurde.Ich habe versucht, HADOOP_CLASSPATH auf mein Glas zu setzen, aber das hat das Problem nicht gelöst.

Jede Hilfe wäre großartig!

PageRankBenchmark-Ausgabe

14/08/01 11:42:27 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:42:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:42:28 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything
14/08/01 11:42:29 INFO mapred.JobClient: Running job: job_201407291058_0015
14/08/01 11:42:30 INFO mapred.JobClient:  map 0% reduce 0%
14/08/01 11:42:40 INFO mapred.JobClient:  map 50% reduce 0%
14/08/01 11:42:41 INFO mapred.JobClient:  map 100% reduce 0%
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
14/08/01 11:42:44 INFO mapred.JobClient: Counters: 39
14/08/01 11:42:44 INFO mapred.JobClient:   File System Counters
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of bytes read=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of bytes written=369846
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of write operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of bytes read=88
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of bytes written=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of read operations=2
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of write operations=1
14/08/01 11:42:44 INFO mapred.JobClient:   Job Counters 
14/08/01 11:42:44 INFO mapred.JobClient:     Launched map tasks=2
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=15772
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:   Map-Reduce Framework
14/08/01 11:42:44 INFO mapred.JobClient:     Map input records=2
14/08/01 11:42:44 INFO mapred.JobClient:     Map output records=0
14/08/01 11:42:44 INFO mapred.JobClient:     Input split bytes=88
14/08/01 11:42:44 INFO mapred.JobClient:     Spilled Records=0
14/08/01 11:42:44 INFO mapred.JobClient:     CPU time spent (ms)=2230
14/08/01 11:42:44 INFO mapred.JobClient:     Physical memory (bytes) snapshot=411357184
14/08/01 11:42:44 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2428895232
14/08/01 11:42:44 INFO mapred.JobClient:     Total committed heap usage (bytes)=806027264
14/08/01 11:42:44 INFO mapred.JobClient:   Giraph Stats
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate edges=50
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate finished vertices=50
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate vertices=50
14/08/01 11:42:44 INFO mapred.JobClient:     Current master task partition=0
14/08/01 11:42:44 INFO mapred.JobClient:     Current workers=1
14/08/01 11:42:44 INFO mapred.JobClient:     Last checkpointed superstep=0
14/08/01 11:42:44 INFO mapred.JobClient:     Sent messages=0
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep=4
14/08/01 11:42:44 INFO mapred.JobClient:   Giraph Timers
14/08/01 11:42:44 INFO mapred.JobClient:     Input superstep (milliseconds)=238
14/08/01 11:42:44 INFO mapred.JobClient:     Setup (milliseconds)=2903
14/08/01 11:42:44 INFO mapred.JobClient:     Shutdown (milliseconds)=68
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 0 (milliseconds)=77
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 1 (milliseconds)=64
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 2 (milliseconds)=45
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 3 (milliseconds)=43
14/08/01 11:42:44 INFO mapred.JobClient:     Total (milliseconds)=3442

SimpleShortestPathsVertex-Ausgabe

14/08/01 11:47:37 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
14/08/01 11:47:37 INFO utils.ConfigurationUtils: Setting custom argument [SimpleShortestPathsVertex.source] to [2] in GiraphConfiguration
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format edge value type is not known
14/08/01 11:47:37 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:47:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:47:38 INFO mapred.JobClient: Running job: job_201407291058_0017
14/08/01 11:47:39 INFO mapred.JobClient:  map 0% reduce 0%
14/08/01 11:47:44 INFO mapred.JobClient:  map 50% reduce 0%
14/08/01 11:47:45 INFO mapred.JobClient:  map 100% reduce 0%
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
14/08/01 11:47:46 INFO mapred.JobClient: Counters: 39
14/08/01 11:47:46 INFO mapred.JobClient:   File System Counters
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of bytes read=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of bytes written=367068
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of write operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of bytes read=200
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of bytes written=30
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of read operations=5
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of write operations=2
14/08/01 11:47:46 INFO mapred.JobClient:   Job Counters 
14/08/01 11:47:46 INFO mapred.JobClient:     Launched map tasks=2
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=8538
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:   Map-Reduce Framework
14/08/01 11:47:46 INFO mapred.JobClient:     Map input records=2
14/08/01 11:47:46 INFO mapred.JobClient:     Map output records=0
14/08/01 11:47:46 INFO mapred.JobClient:     Input split bytes=88
14/08/01 11:47:46 INFO mapred.JobClient:     Spilled Records=0
14/08/01 11:47:46 INFO mapred.JobClient:     CPU time spent (ms)=1590
14/08/01 11:47:46 INFO mapred.JobClient:     Physical memory (bytes) snapshot=341344256
14/08/01 11:47:46 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2363527168
14/08/01 11:47:46 INFO mapred.JobClient:     Total committed heap usage (bytes)=504758272
14/08/01 11:47:46 INFO mapred.JobClient:   Giraph Stats
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate edges=12
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate finished vertices=5
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate vertices=5
14/08/01 11:47:46 INFO mapred.JobClient:     Current master task partition=0
14/08/01 11:47:46 INFO mapred.JobClient:     Current workers=1
14/08/01 11:47:46 INFO mapred.JobClient:     Last checkpointed superstep=0
14/08/01 11:47:46 INFO mapred.JobClient:     Sent messages=0
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep=4
14/08/01 11:47:46 INFO mapred.JobClient:   Giraph Timers
14/08/01 11:47:46 INFO mapred.JobClient:     Input superstep (milliseconds)=181
14/08/01 11:47:46 INFO mapred.JobClient:     Setup (milliseconds)=313
14/08/01 11:47:46 INFO mapred.JobClient:     Shutdown (milliseconds)=128
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 0 (milliseconds)=57
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 1 (milliseconds)=54
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 2 (milliseconds)=36
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 3 (milliseconds)=35
14/08/01 11:47:46 INFO mapred.JobClient:     Total (milliseconds)=805
War es hilfreich?

Lösung

Ok, nachdem ich die Hadoop-Skripte zusammen mit Hadaop- und Girap-Quelle angesehen habe, denke ich, dass ich es herausgefunden habe. Der große Hinweis kam von mit der Option Libjars mit Hadoop zusammen mit dieser Zeile von Die Ausgabe:

warn mapred.jobclient: Verwenden Sie genericoptionssparser zum Analysieren der Argumente. Anwendungen sollten das Werkzeug für dasselbe implementieren.

Die Ursache scheint zu sein, dass Giraprunner seine eigenen configurationutils.parargs () verwendet, um die org.apache.commons.cli.commandline zu erhalten, anstatt die empfohlene org.apache.hadoop.util.genericoptionssparser.getcommandline () zu verwenden ehrt die Option "libjars". Dies führte mich dazu, auf Hadoops Generic Classpath-Handling-Tools zurückzufallen: ClassPath und / oder Hadoop_ClassPath. Hier ist was funktioniert:

  • Set Hadoop_ClassPath, um Ihre Anwendung JAR und das Gi2raphenkernglas mit einem Dickdarm Trennzeichen aufzunehmen.
  • pass -libjars mit demselben Klassenpfad, aber mit einem Komma Trennzeichen.

Zum Beispiel auf meinem Computer:

generasacodicetagpre.

was den erwarteten Ausgang und Ergebnisse ergibt.

In der Regel wäre es hilfreich, wenn das Giraph-Team den Code geändert hat, um den (anscheinend) mehr Standardparser zu verwenden.

hoffe das hilft!

Andere Tipps

Ich weiß nicht, warum das nicht funktioniert, aber es gibt eine schnelle und unkomplizierte Möglichkeit, das Problem zu beheben.Versuchen Sie, Ihren Code einzugeben giraph-examples/src/main/java/org/apache/giraph/examples/ Verzeichnis (in dem sich SimpleShortestPath befindet).Und dann durch Ausführen ein Giraph-Examples-Glas erstellen mvn -DskipTests --projects giraph-examples --also-make package.Führen Sie dann einfach Ihr Programm aus, wie Sie es für SimpleShortestPath getan haben, und ersetzen Sie SimpleShortestPath durch Ihren Dateinamen.Ich hoffe das hilft.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top