Domanda

I have a Spark project which I can run from sbt console. However, when I try to run it from the command line, I get Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkContext. This is expected, because the Spark libs are listed as provided in the build.sbt.

How do I configure things so that I can run the JAR from the command line, without having to use sbt console?

È stato utile?

Soluzione

To run Spark stand-alone you need to build a Spark assembly. Run sbt/sbt assembly on the spark root dir. This will create: assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar

Then you build your job jar with dependencies (either with sbt assembly or maven-shade-plugin)

You can use the resulting binaries to run your spark job from the command line:

ADD_JARS=job-jar-with-dependencies.jar SPARK_LOCAL_IP=<IP> java -cp spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar:job-jar-with-dependencies.jar com.example.jobs.SparkJob

Note: If you need other HDFS version, you need to follow additional steps before building the assembly. See About Hadoop Versions

Altri suggerimenti

Using sbt assembly plugin we can create a single jar. After doing that you can simply run it using java -jar command

For more details refer

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top