Turns out that running ./bin/pyspark interactively AUTOMATICALLY LOADS A SPARKCONTEXT. Here is what I see when I start pyspark:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 0.9.1
/_/
Using Python version 2.6.6 (r266:84292, Feb 22 2013 00:00:18)
Spark context available as sc.
...so you can either run "del sc" at the beginning or else go ahead and use "sc" as automatically defined.
The other problem with the example is that it appears to look at a regular NFS filesystem location, whereas it really is trying to look at the HDFS filesystem for Hadoop. I had to upload the README.md file in the $SPARK_HOME location using "hadoop fs -put README.md README.md" before running the code.
Here is the modified example program that I ran interactively:
from pyspark import SparkContext
logFile = "README.md"
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print "Lines with a: %i, lines with b: %i" % (numAs, numBs)
and here is the modified version of the stand-alone python file:
"""SimpleApp.py"""
from pyspark import SparkContext
logFile = "README.md" # Should be some file on your system
sc = SparkContext("local", "Simple App")
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print "Lines with a: %i, lines with b: %i" % (numAs, numBs)
which I can now execute using $SPARK_HOME/bin/pyspark SimpleApp.py