How to run a pyspark application in windows 8 command prompt

https://datascience.stackexchange.com/questions/6169

16-10-2019
|

문제

I have a python script written with Spark Context and I want to run it. I tried to integrate IPython with Spark, but I could not do that. So, I tried to set the spark path [ Installation folder/bin ] as an environment variable and called spark-submit command in the cmd prompt. I believe that it is finding the spark context, but it produces a really big error. Can someone please help me with this issue?

Environment variable path: C:/Users/Name/Spark-1.4;C:/Users/Name/Spark-1.4/bin

After that, in cmd prompt: spark-submit script.py

enter image description here

해결책 4

Finally, I resolved the issue. I had to set the pyspark location in PATH variable and py4j-0.8.2.1-src.zip location in PYTHONPATH variable.

다른 팁

I'm fairly new to Spark, and have figured out how to integrate with with IPython on Windows 10 and 7. First, check your environment variables for Python and Spark. Here are mine: SPARK_HOME: C:\spark-1.6.0-bin-hadoop2.6\ I use Enthought Canopy, so Python is already integrated in my system path. Next, launch Python or IPython and use the following code. If you get an error, check what you get for 'spark_home'. Otherwise, it should run just fine.

import os

import sys

spark_home = os.environ.get('SPARK_HOME', None)

if not spark_home:

raise ValueError('SPARK_HOME environment variable is not set')

sys.path.insert(0, os.path.join(spark_home, 'python'))

sys.path.insert(0, os.path.join(spark_home, 'C:/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip')) ## may need to adjust on your system depending on which Spark version you're using and where you installed it.

execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))

Check if this link could help you out.

Johnnyboycurtis answer works for me. If you are using python 3, use below code. His code doesnt work in python 3. I am editing only the last line of his code.

import os
import sys


spark_home = os.environ.get('SPARK_HOME', None)
print(spark_home)
if not spark_home:
    raise ValueError('SPARK_HOME environment variable is not set')
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'C:/spark-1.6.1-bin-hadoop2.6/spark-1.6.1-bin-hadoop2.6/python/lib/py4j-0.9-src.zip')) ## may need to adjust on your system depending on which Spark version you're using and where you installed it.


filename=os.path.join(spark_home, 'python/pyspark/shell.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 datascience.stackexchange