I've been trying to get Spark running using Java (in case it matters, I'm using IntelliJ as my IDE). I ran the calculate-pi code found here https://spark.apache.org/examples.html successfully after quite a bit of fiddling.
I had the following error at first: "java.lang.ClassNotFoundException: [B"
But I fixed it with the following flag in the VM: "-Dsun.lang.ClassLoader.allowArraySyntax=true".
Now I'm trying to build, train and run an SVMModel, as described here http://spark.apache.org/docs/0.9.0/mllib-guide.html but I'm struggling with the small amount of Java documentation available.
I'm trying to run the following code:
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.mllib.classification.SVMModel;
import org.apache.spark.mllib.classification.SVMWithSGD;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.jblas.DoubleMatrix;
import java.util.ArrayList;
import java.util.List;
public class MinimalBugCode {
public static void main(String[] args) throws Exception {
try {
JavaSparkContext spark = new JavaSparkContext("local","MinimalBugCode");
int numberOfIterations = 10000;
double[] theOutputData = { 1.0, 0.0, 0.0, 1.0, 1.0};
double[][] theInputData = { { 2.2, 3.1, 1.7 }, { 1.9, 2.1, 0.6 }, { 4.7, 0.5, 1.3 }, { 2.6, 2.9, 2.2 }, { 1.5, 4.1, 1.5 }};
int lengthOfData = theOutputData.length;
List<LabeledPoint> thePointList = new ArrayList<LabeledPoint>();
for ( int i=0; i<lengthOfData; i++ ) {
LabeledPoint thisLabelledPoint = new LabeledPoint( theOutputData[i], theInputData[i] );
thePointList.add(thisLabelledPoint);
}
JavaRDD<LabeledPoint> theSparkRDD = spark.parallelize( thePointList );
SVMModel theInnerModel = SVMWithSGD.train(theSparkRDD.rdd(), numberOfIterations);
DoubleMatrix weights = new DoubleMatrix( theInnerModel.weights() );
} catch (Exception e) {
e.printStackTrace();
}
System.exit(0);
}
}
It gets as far as the line
SVMModel theInnerModel = SVMWithSGD.train(theSparkRDD.rdd(), numberOfIterations);
at which point it breaks with the following console output:
Exception in thread "main" java.lang.IncompatibleClassChangeError: class scala.reflect.ManifestFactory$$anon$6 has interface scala.reflect.AnyValManifest as super class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:455)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:367)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at scala.reflect.ManifestFactory$.<init>(Manifest.scala:88)
at scala.reflect.ManifestFactory$.<clinit>(Manifest.scala)
at scala.reflect.ClassManifestFactory$.<init>(ClassManifestDeprecatedApis.scala:150)
at scala.reflect.ClassManifestFactory$.<clinit>(ClassManifestDeprecatedApis.scala)
at scala.reflect.package$.<init>(package.scala:34)
at scala.reflect.package$.<clinit>(package.scala)
at scala.reflect.ClassTag$.<init>(ClassTag.scala:114)
at scala.reflect.ClassTag$.<clinit>(ClassTag.scala)
at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:139)
at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:123)
at org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:133)
at org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:173)
at org.apache.spark.mllib.classification.SVMWithSGD.train(SVM.scala)
at fiddle.spark.MinimalBugCode.main(MinimalBugCode.java:37)
I'm finding these error messages rather impenetrable! I'm quite new to spark but have a fair amount of Java experience.
Many thanks in advance!
-StackG
ps. As an aside, when I run the debugger over the JavaRDD object, it appears in the debugging terminal and I can drill down and see the vector of (label,features) and see the actual numbers that I put into it further up, BUT the label itself on the "data" property says "Method threw 'java.lang.NoSuchMethodError' exception. Cannot evaluate scala.collection.JavaConversions$JListWrapper.toString()" - is this purely a cosmetic issue or symptomatic of something deeper that is going wrong?