Giraph最短路径示例ClassNotFoundException

https://stackoverflow.com//questions/10700853

13-12-2019
|

题

我正在尝试从吉拉鱼孵化器（ https://cwiki.apache.org/confluence/display/giraph/shortest +Paths+Example ）。但是，而不是从吉拉的执行示例 - * - 依赖关系.jar，我创建了自己的作业jar。当我创建一个如下示例中呈现的单个作业文件时，我得到

java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.Test$SimpleShortestPathsVertexInputFormat

然后我已经移动了内部类（simpleShortestpathsvedtexInputFormat和simpleShortestPathsvedtexOutputFormat）来分隔文件并将其重命名为（simpleShortestPathsvedtexInputFormat_v2，simpleShortestPathsvertexOutputFormat_v2）;类不再静态。这已经解决了为SimpleShortestPathsvertexInputFormat_v2找不到的类的问题，但是我仍然对simpleShortestPathsvertexOutputuformat_v2来获得相同的错误。下面是我的堆栈跟踪。

INFO mapred.JobClient: Running job: job_201205221101_0003
INFO mapred.JobClient:  map 0% reduce 0%
INFO mapred.JobClient: Task Id : attempt_201205221101_0003_m_000005_0, Status : FAILED
    java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.utils.SimpleShortestPathsVertexOutputFormat_v2
            at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:898)
            at org.apache.giraph.graph.BspUtils.getVertexOutputFormatClass(BspUtils.java:134)
            at org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:56)
            at org.apache.hadoop.mapred.Task.initialize(Task.java:490)
            at org.apache.hadoop.mapred.MapTask.run(MapTask.java:352)
            at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:415)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
            at org.apache.hadoop.mapred.Child.main(Child.java:253)
    Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.test.giraph.utils.SimpleShortestPathsVertexOutputFormat_v2
            at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:866)
            at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:890)
            ... 9 more

我已经检查了我的工作罐，所有课程都在那里。此外，我在伪分布式模式下使用Hadoop 0.20.203。我推出工作的方式如下所示。

hadoop jar giraphJobs.jar org.test.giraph.Test -libjars /path/to/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar /path/to/input /path/to/output 0 3
.

我还为吉拉鱼定义了hadoop_classpath - * - 依赖关系.jar。我可以在没有问题的情况下运行pagerbenchmarkmark exampl（直接来自giraph - * - infficeencies.jar），并且短路路径示例也有效（也直接来自Giraph - * - 依赖关系.jar）。其他Hadoop职位没有问题（如果我的“群集”正常工作，我已阅读以测试）。有人遇到过类似的问题吗？任何帮助都会受到赞赏。

解决方案（抱歉发布它，但我无法回答我自己的问题几个小时）
要解决此问题，我必须将我的作业jar添加到-libjars（没有更改到Hadoop_classPath的地方）。启动作业的命令现在看起来像这样。

hadoop jar giraphJobs.jar org.test.giraph.Test -libjars /path/to/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar,/path/to/job.jar /path/to/input /path/to/output 0 3
.

罐子列表必须是逗号分开。虽然这已经解决了我的问题。我仍然好奇为什么我必须把我的工作罐作为“类路径”参数？有人能解释一下我的理性是什么？正如我发现它奇怪的（说最少）来调用我的作业罐，然后再次将其作为“类路径”jar。我对解释非常好奇。

解决方案

我发现了解决问题的替代程序解决方案。我们需要以下面的方式修改run（）方法 -

...
@Override
public int run(String[] argArray) throws Exception {
    Preconditions.checkArgument(argArray.length == 4,
        "run: Must have 4 arguments <input path> <output path> " +
        "<source vertex id> <# of workers>");

    GiraphJob job = new GiraphJob(getConf(), getClass().getName());
    // This is the addition - it will make hadoop look for other classes in the same     jar that contains this class
    job.getInternalJob().setJarByClass(getClass());
    job.setVertexClass(getClass());
    ...
}

setjarbyclass（）将使Hadoop查找包含GetClass（）返回的类的同一jar中的缺失类，并且我们不需要将作业jar名称单独添加到-libjars选项。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow