Workarounds when a string is too long for a .join. OverflowError occurs
Question
I'm working through some python problems on pythonchallenge.com to teach myself python and I've hit a roadblock, since the string I am to be using is too large for python to handle. I receive this error:
my-macbook:python owner1$ python singleoccurrence.py
Traceback (most recent call last):
File "singleoccurrence.py", line 32, in <module>
myString = myString.join(line)
OverflowError: join() result is too long for a Python string
What alternatives do I have for this issue? My code looks like such...
#open file testdata.txt
#for each character, check if already exists in array of checked characters
#if so, skip.
#if not, character.count
#if count > 1, repeat recursively with first character stripped off of page.
# if count = 1, add to valid character array.
#when string = 0, print valid character array.
valid = []
checked = []
myString = ""
def recursiveCount(bigString):
if len(bigString) == 0:
print "YAY!"
return valid
myChar = bigString[0]
if myChar in checked:
return recursiveCount(bigString[1:])
if bigString.count(myChar) > 1:
checked.append(myChar)
return recursiveCount(bigString[1:])
checked.append(myChar)
valid.append(myChar)
return recursiveCount(bigString[1:])
fileIN = open("testdata.txt", "r")
line = fileIN.readline()
while line:
line = line.strip()
myString = myString.join(line)
line = fileIN.readline()
myString = recursiveCount(myString)
print "\n"
print myString
Solution
string.join doesn't do what you think. join is used to combine a list of words into a single string with the given seperator. Ie:
>>> ",".join(('foo', 'bar', 'baz'))
'foo,bar,baz'
The code snippet you posted will attempt to insert myString between every character in the variable line. You can see how that will get big quickly :-). Are you trying to read the entire file into a single string, myString? If so, the way you want to concatenate the strings is like this:
myString = myString + line
While I'm here... since you're learning Python here are some other suggestions.
There are easier ways to read an entire file into a variable. For instance:
fileIN = open("testdata.txt", "r")
myString = fileIN.read()
(This won't have the exact behaviour of your existing strip() code, but may in fact do what you want.)
Also, I would never recommend practical Python code use recursion to iterate over a string. Your code will make a function call (and a stack entry) for every character in the string. Also I'm not sure Python will be very smart about all the uses of bigString[1:]: it may well create a second string in memory that's a copy of the original without the first character. The simplest way to process every character in a string is:
for mychar in bigString:
... do your stuff ...
Finally, you are using the list named "checked" to see if you've ever seen a particular character before. But the membership test on lists ("if myChar in checked") is slow. In Python you're better off using a dictionary:
checked = {}
...
if not checked.has_key(myChar):
checked[myChar] = True
...
This exercise you're doing is a great way to learn several Python idioms.