Difficulties getting Spark SVMModel running with Java - java.lang.IncompatibleClassChangeError

StackOverflow https://stackoverflow.com/questions/23384047

  •  12-07-2023
  •  | 
  •  

Question

I've been trying to get Spark running using Java (in case it matters, I'm using IntelliJ as my IDE). I ran the calculate-pi code found here https://spark.apache.org/examples.html successfully after quite a bit of fiddling.

I had the following error at first: "java.lang.ClassNotFoundException: [B" But I fixed it with the following flag in the VM: "-Dsun.lang.ClassLoader.allowArraySyntax=true".

Now I'm trying to build, train and run an SVMModel, as described here http://spark.apache.org/docs/0.9.0/mllib-guide.html but I'm struggling with the small amount of Java documentation available.

I'm trying to run the following code:

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.mllib.classification.SVMModel;
import org.apache.spark.mllib.classification.SVMWithSGD;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.jblas.DoubleMatrix;

import java.util.ArrayList;
import java.util.List;

public class MinimalBugCode {

    public static void main(String[] args) throws Exception {
        try {
            JavaSparkContext spark = new JavaSparkContext("local","MinimalBugCode");

            int numberOfIterations = 10000;
            double[] theOutputData = { 1.0, 0.0, 0.0, 1.0, 1.0};
            double[][] theInputData = { { 2.2, 3.1, 1.7 }, { 1.9, 2.1, 0.6 }, { 4.7, 0.5, 1.3 }, { 2.6, 2.9, 2.2 }, { 1.5, 4.1, 1.5 }};

            int lengthOfData = theOutputData.length;

            List<LabeledPoint> thePointList = new ArrayList<LabeledPoint>();

            for ( int i=0; i<lengthOfData; i++ ) {
                LabeledPoint thisLabelledPoint = new LabeledPoint( theOutputData[i], theInputData[i] );
                thePointList.add(thisLabelledPoint);
            }

            JavaRDD<LabeledPoint> theSparkRDD = spark.parallelize( thePointList );

            SVMModel theInnerModel = SVMWithSGD.train(theSparkRDD.rdd(), numberOfIterations);

            DoubleMatrix weights = new DoubleMatrix( theInnerModel.weights() );


        } catch (Exception e) {
            e.printStackTrace();
        }
        System.exit(0);
    }
}

It gets as far as the line

SVMModel theInnerModel = SVMWithSGD.train(theSparkRDD.rdd(), numberOfIterations);

at which point it breaks with the following console output:

Exception in thread "main" java.lang.IncompatibleClassChangeError: class scala.reflect.ManifestFactory$$anon$6 has interface scala.reflect.AnyValManifest as super class
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:455)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:367)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at scala.reflect.ManifestFactory$.<init>(Manifest.scala:88)
    at scala.reflect.ManifestFactory$.<clinit>(Manifest.scala)
    at scala.reflect.ClassManifestFactory$.<init>(ClassManifestDeprecatedApis.scala:150)
    at scala.reflect.ClassManifestFactory$.<clinit>(ClassManifestDeprecatedApis.scala)
    at scala.reflect.package$.<init>(package.scala:34)
    at scala.reflect.package$.<clinit>(package.scala)
    at scala.reflect.ClassTag$.<init>(ClassTag.scala:114)
    at scala.reflect.ClassTag$.<clinit>(ClassTag.scala)
    at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:139)
    at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:123)
    at org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:133)
    at org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:173)
    at org.apache.spark.mllib.classification.SVMWithSGD.train(SVM.scala)
    at fiddle.spark.MinimalBugCode.main(MinimalBugCode.java:37)

I'm finding these error messages rather impenetrable! I'm quite new to spark but have a fair amount of Java experience.

Many thanks in advance!
-StackG


ps. As an aside, when I run the debugger over the JavaRDD object, it appears in the debugging terminal and I can drill down and see the vector of (label,features) and see the actual numbers that I put into it further up, BUT the label itself on the "data" property says "Method threw 'java.lang.NoSuchMethodError' exception. Cannot evaluate scala.collection.JavaConversions$JListWrapper.toString()" - is this purely a cosmetic issue or symptomatic of something deeper that is going wrong?

Was it helpful?

Solution

I've now solved my own problem and thought I'd leave it here in case other people encounter similar errors.


What I did:

1) Upgraded to the latest Spark, 1.0.0, which came out last week. I include the following libraries in my project:

  • org.apache.spark:spark-core_2.10:1.0.0
  • org.apache.spark:spark-mllib_2.10:1.0.0

and removed the old ones.

2) Initially this gives errors as explained here: http://spark.apache.org/docs/latest/mllib-guide.html.

I adjusted my double[] objects to Vector essentially by wrapping them in the Vectors.dense(...) method, and my JavaRDD<double[]> objects to JavaRDD<Vector> objects using

newRdd = oldRdd.map( r => Vectors.dense(r) )

3) Now my JavaSparkContext line threw an error saying "Signer information does not match signer information of other classes in the same package". After some digging around, it turned out to be due to an old version of a library, "servlet-api-2.5.0.jar" that was in my library directory. I removed this and replaced it with "javax.servlet-api-3.0.1.jar"


After these changes, the programme compiles and runs successfully, and seems to be producing the results I expect. It has also cleared up the other issue mentioned in my postscript, in that I can now see the RDD data in the debugger and labels appear properly.

I hope this will be of help to someone in the future, and many thanks to all of the developers who have been working on Spark - it's been a fun experience so far.

-StackG

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top