Domanda

Sono relativamente nuovo a Giraph e sto cercando di ottenere il mio ciclo di giraph modifica-distribuire il ciclo che funziona per il nostro codice. Sono in grado di eseguire vari esempi ispirati a http://blog.clowerra.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/ , ma sono bloccato con Una classeNotFoundException quando si esegue la mia versione modificata dell'esempio di Giraph SimpleShortCathsvertex. Ho provato varie combinazioni di -libjars e hadoop_classpathpath, ma sono fuori di idee e apprezzerei davvero il tuo aiuto. DETTAGLI Segui.

versioni

    .
  • hadoop: hadoop 2.0.0-cdh4.4.0
  • Giraph: Giraph-Exexples-1.0.0-for-hadoop-2.0.0-alfa-jar-with-with-dipendenze.jar

La PageRankBenchmark esegue OK

$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.benchmark.PageRankBenchmark \
-Dgiraph.zkList=<myhost>:2181 \
-e 1 -s 3 -v -V 50 -w 1

...
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
...
(full output is below)
.

GIRAFHRUNNER SIMPLESHORSORTSTRAPHASTSTERTEX esegue anche OK

$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
org.apache.giraph.examples.SimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op goutput/shortestpathsC2 \
-ca SimpleShortestPathsVertex.source=2 \
-w 1

...
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
...
(full output is below)
.

Bonus: i risultati sono corretti:

$ hadoop fs -cat goutput/shortestpathsC2/p*
0   1.0
2   2.0
1   0.0
3   1.0
4   5.0
.

Ma la mia versione modificata di SimpleShortPathSvertex ottiene ClassNotFoundException

Il barattolo contenente il vertice modificato (kdlsimpleshortathsvertex, nessun pacchetto) è OK:

$ jar -tf ~/kdl_hadoop_play.jar
META-INF/MANIFEST.MF
KdlSimpleShortestPathsVertex.class
META-INF/
.

Ma la mia corsa PUKES:

$ hadoop jar $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
-libjars ~/kdl_hadoop_play.jar \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/cornell/goutput/shortestpathsC2 \
-ca KdlSimpleShortestPathsVertex.source=2 \
-w 1

Exception in thread "main" java.lang.ClassNotFoundException: KdlSimpleShortestPathsVertex
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:210)
at org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
.

La mia migliore ipotesi ...

... Dopo aver guardato in giro è che forse Giraphrunner non sta elaborando correttamente i -libjars, come accennato da http://grepalex.com/2013/02/25/HADOOP-Libjars/ ("Assicurati che il tuo codice stia usando GenericoPtionsParser"). Navigando la fonte di Giraph, non vedo quella classe accessibile. Ho provato a impostare Hadoop_Classpath per il mio vaso, ma questo non ha risolto il problema.

Qualsiasi aiuto sarebbe fantastico!

Output PageRankBenchmark

14/08/01 11:42:27 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:42:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:42:28 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything
14/08/01 11:42:29 INFO mapred.JobClient: Running job: job_201407291058_0015
14/08/01 11:42:30 INFO mapred.JobClient:  map 0% reduce 0%
14/08/01 11:42:40 INFO mapred.JobClient:  map 50% reduce 0%
14/08/01 11:42:41 INFO mapred.JobClient:  map 100% reduce 0%
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
14/08/01 11:42:44 INFO mapred.JobClient: Counters: 39
14/08/01 11:42:44 INFO mapred.JobClient:   File System Counters
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of bytes read=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of bytes written=369846
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     FILE: Number of write operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of bytes read=88
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of bytes written=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of read operations=2
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient:     HDFS: Number of write operations=1
14/08/01 11:42:44 INFO mapred.JobClient:   Job Counters 
14/08/01 11:42:44 INFO mapred.JobClient:     Launched map tasks=2
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=15772
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient:   Map-Reduce Framework
14/08/01 11:42:44 INFO mapred.JobClient:     Map input records=2
14/08/01 11:42:44 INFO mapred.JobClient:     Map output records=0
14/08/01 11:42:44 INFO mapred.JobClient:     Input split bytes=88
14/08/01 11:42:44 INFO mapred.JobClient:     Spilled Records=0
14/08/01 11:42:44 INFO mapred.JobClient:     CPU time spent (ms)=2230
14/08/01 11:42:44 INFO mapred.JobClient:     Physical memory (bytes) snapshot=411357184
14/08/01 11:42:44 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2428895232
14/08/01 11:42:44 INFO mapred.JobClient:     Total committed heap usage (bytes)=806027264
14/08/01 11:42:44 INFO mapred.JobClient:   Giraph Stats
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate edges=50
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate finished vertices=50
14/08/01 11:42:44 INFO mapred.JobClient:     Aggregate vertices=50
14/08/01 11:42:44 INFO mapred.JobClient:     Current master task partition=0
14/08/01 11:42:44 INFO mapred.JobClient:     Current workers=1
14/08/01 11:42:44 INFO mapred.JobClient:     Last checkpointed superstep=0
14/08/01 11:42:44 INFO mapred.JobClient:     Sent messages=0
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep=4
14/08/01 11:42:44 INFO mapred.JobClient:   Giraph Timers
14/08/01 11:42:44 INFO mapred.JobClient:     Input superstep (milliseconds)=238
14/08/01 11:42:44 INFO mapred.JobClient:     Setup (milliseconds)=2903
14/08/01 11:42:44 INFO mapred.JobClient:     Shutdown (milliseconds)=68
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 0 (milliseconds)=77
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 1 (milliseconds)=64
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 2 (milliseconds)=45
14/08/01 11:42:44 INFO mapred.JobClient:     Superstep 3 (milliseconds)=43
14/08/01 11:42:44 INFO mapred.JobClient:     Total (milliseconds)=3442
.

Uscita SIMPLESHORTSTPACHAPHSTERSTERTEX

14/08/01 11:47:37 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
14/08/01 11:47:37 INFO utils.ConfigurationUtils: Setting custom argument [SimpleShortestPathsVertex.source] to [2] in GiraphConfiguration
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format edge value type is not known
14/08/01 11:47:37 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:47:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:47:38 INFO mapred.JobClient: Running job: job_201407291058_0017
14/08/01 11:47:39 INFO mapred.JobClient:  map 0% reduce 0%
14/08/01 11:47:44 INFO mapred.JobClient:  map 50% reduce 0%
14/08/01 11:47:45 INFO mapred.JobClient:  map 100% reduce 0%
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
14/08/01 11:47:46 INFO mapred.JobClient: Counters: 39
14/08/01 11:47:46 INFO mapred.JobClient:   File System Counters
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of bytes read=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of bytes written=367068
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     FILE: Number of write operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of bytes read=200
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of bytes written=30
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of read operations=5
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient:     HDFS: Number of write operations=2
14/08/01 11:47:46 INFO mapred.JobClient:   Job Counters 
14/08/01 11:47:46 INFO mapred.JobClient:     Launched map tasks=2
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=8538
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient:   Map-Reduce Framework
14/08/01 11:47:46 INFO mapred.JobClient:     Map input records=2
14/08/01 11:47:46 INFO mapred.JobClient:     Map output records=0
14/08/01 11:47:46 INFO mapred.JobClient:     Input split bytes=88
14/08/01 11:47:46 INFO mapred.JobClient:     Spilled Records=0
14/08/01 11:47:46 INFO mapred.JobClient:     CPU time spent (ms)=1590
14/08/01 11:47:46 INFO mapred.JobClient:     Physical memory (bytes) snapshot=341344256
14/08/01 11:47:46 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2363527168
14/08/01 11:47:46 INFO mapred.JobClient:     Total committed heap usage (bytes)=504758272
14/08/01 11:47:46 INFO mapred.JobClient:   Giraph Stats
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate edges=12
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate finished vertices=5
14/08/01 11:47:46 INFO mapred.JobClient:     Aggregate vertices=5
14/08/01 11:47:46 INFO mapred.JobClient:     Current master task partition=0
14/08/01 11:47:46 INFO mapred.JobClient:     Current workers=1
14/08/01 11:47:46 INFO mapred.JobClient:     Last checkpointed superstep=0
14/08/01 11:47:46 INFO mapred.JobClient:     Sent messages=0
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep=4
14/08/01 11:47:46 INFO mapred.JobClient:   Giraph Timers
14/08/01 11:47:46 INFO mapred.JobClient:     Input superstep (milliseconds)=181
14/08/01 11:47:46 INFO mapred.JobClient:     Setup (milliseconds)=313
14/08/01 11:47:46 INFO mapred.JobClient:     Shutdown (milliseconds)=128
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 0 (milliseconds)=57
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 1 (milliseconds)=54
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 2 (milliseconds)=36
14/08/01 11:47:46 INFO mapred.JobClient:     Superstep 3 (milliseconds)=35
14/08/01 11:47:46 INFO mapred.JobClient:     Total (milliseconds)=805
.
È stato utile?

Soluzione

OK, dopo aver guardato gli script di Hadoop insieme a hadoop e fonte di giraph, penso di averlo capito. Il grande suggerimento è venuto da usando l'opzione libjars con hadoop insieme a questa linea da L'output:

.

warn mapred.joblient: usa GenericoPtionsParser per analizzare il argomenti. Le applicazioni dovrebbero implementare lo strumento per lo stesso.

La causa sembra essere che Giraphrunner utilizza i propri configurationsIls.Parsearsgs () per ottenere l'Org.apache.commons.cli.comMandline invece di utilizzare l'Org.apache.Hadoop.util.GenericOptionsParser.GetCommandline (), che onora l'opzione "Libjars". Questo mi ha portato a ricadere sugli strumenti di movimentazione di classpath generico di Hadoop: classpath e / o hadoop_classpath. Ecco cosa funzionava:

    .
  • Set hadoop_classpathPath per includere il barattolo dell'applicazione e il barattolo Gigraph Core, utilizzando un delimitatore Colon .
  • Pass -libjars che utilizza lo stesso classpath ma con un delimitatore virgola .

Ad esempio, sulla mia macchina:

$ export GIRAPH_HOME=/share/apps/giraph
$ export HADOOP_CLASSPATH=/home/<me>/kdl_hadoop_play.jar:$GIRAPH_HOME/giraph-ex.jar:$HADOOP_CLASSPATH
$ export LIBJARS=/home/<me>/kdl_hadoop_play.jar,$GIRAPH_HOME/giraph-core.jar
$ hadoop fs -rm -R goutput/shortestpathsC2
$ hadoop jar $GIRAPH_HOME/giraph-ex.jar org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
-libjars ${LIBJARS} \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/cornell/goutput/shortestpathsC2 \
-ca SimpleShortestPathsVertex.source=2 \
-w 1
...
$ hadoop fs -cat goutput/shortestpathsC2/p*
.

che fornisce l'output e i risultati previsti.

Più in generale, sarebbe utile se il team Giraph ha cambiato il codice per utilizzare il parser (apparentemente) più standard.

Spero che ti aiuti!

Altri suggerimenti

Non so perché questo non funziona ma c'è un modo rapido e sporco per risolvere questo problema.Prova a mettere il tuo codice nella directory giraph-examples/src/main/java/org/apache/giraph/examples/ (dove si trova SimpleshortPathPath).E quindi costruisci vaso di esempi di giraph eseguendo mvn -DskipTests --projects giraph-examples --also-make package.Quindi esegui semplicemente il programma come hai fatto per SimpleShortStath sostituirlo SimpleShortPath con il tuo nome file. Spero che aiuti.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top