Question

I'm ashamed to resort to asking for help again, but I'm stuck.

I have a spanish novel (in plain text), and I have a Python script that's supposed to put translations for difficult words in parentheses, using a custom dictionary in another text file.

After a lot of trial and error, I've managed to have the script run, and write the novel to a new text file as it's supposed to do.

Only problem is, no changes have been made to the text in the novel, that is, the translations haven't been inserted into the text. The dictionary is a plain text file, and it's formatted like this:

[spanish word] [english translation]                                      
[spanish word] [english translation]

and so on. Note that the words isn't really enclosed in brackets. There's a single space between each word, and there isn't spaces anywhere else in the file.

Here's the offending code:

bookin = (open("novel.txt")).read()
subin = open("dictionary.txt")
for line in subin.readlines():
    ogword, meaning = line.split(" ")
    subword = ogword + "(meaning)"
    bookin.replace(ogword, subword)
    ogword = ogword.capitalize()
    subword = ogword + "(meaning)"
    bookin.replace(ogword, subword)
subin.close()
bookout = open("output.txt", "w")
bookout.write(bookin)
bookout.close()

Advice would be greatly appreciated.

Edit: The MemoryError is solved now, there were errors in the dictionary I thought I'd fixed. Thank you so much to those who helped me with this stupid problem!

Was it helpful?

Solution

Change:

bookin.replace(ogword, subword)

to

bookin = bookin.replace(ogword, subword)

Explanation: replace does not change the string in place- in fact, strings are immutable- instead, it returns a new version.

OTHER TIPS

As @David Robinson pointed out the problem was your use of replace. It should have been

 bookin = bookin.replace(ogwrd, subword)

I was up last night when you posted your question (and I upvoted both the question and the answer - I didn't get to post in time myself), but the question stuck with me. And even though an answer has been posted and accepted, I wanted to offer the following advice - as I believe that if you can generate code like shown above, it is quite likely that you can ferret out most sources of your problems autonomously.

What I would suggest in these sort of problems is to create a small data files, say 10 records/lines and use it to trace the data through your program by peppering it with some diagnostic print statements. I am showing a version of this below. It's not completely done, but I hope the intention is clear.

The basic idea is to verify that everything you expect to happen is actually happening at each step by looking at the output your "debugging print statements" generate. In this case you would have seen that bookin did not get modified.

bookin = (open("novel.txt")).read()
subin = open("dictionary.txt")

print 'bookin =', bookin  # verify that you read the information 

for line in subin.readlines():
    print 'line = ', line # verify line read

    ogword, meaning = line.split(" ")
    print 'ogword, meaning = ', ogword, meaning # verify ...

    subword = ogword + "(meaning)"
    print 'subword =', subword # verify ...

    bookin.replace(ogword, subword)
    print 'bookin post replace =', bookin # verify ... etc

    ogword = ogword.capitalize()
    subword = ogword + "(meaning)"
    bookin.replace(ogword, subword)

subin.close() 

print 'bookout', bookout # make sure final output is good ...
bookout = open("output.txt", "w")
bookout.write(bookin)
bookout.close()

Finally, one additional plus that Python has over other languages is that you can work with it interactively. What I end up doing frequently is to verify my understanding of functions and behavior in the interpreter (I'm often too lazy to look at the documentation - that's actually not a good thing). So, in your case since the problem was with replace (my debugging print statements would have shown this to me) I would have tried the following sequence in the interpreter

 s = 'this is a test'
 print s
 s.replace('this', 'that')
 print s

and would have seen that s didn't change, in which case I'd have looked at the documentation, or simply tried s = s.replace('this', 'that').

I hope this is helpful. This basic debugging technique can often help pinpoint a problem area and be a good first step. Down the line debuggers etc are quite useful.

PS: I'm new to SO, so I hope this sort of additional answer is not frowned upon.

Apart from the MemoryError, which is astonishing, given the size of your files, you still have several things that could be improved; see comments below:

bookin = open("novel.txt").read() # don't need extra ()
subin = open("dictionary.txt")
# for line in subin.readlines():
# readlines() reads the whole file, you don't need that
for line in subin:
    # ogword, meaning = line.split(" ")
    # the above will leave a newline on the end of "meaning"
    ogword, meaning = line.split()
    # subword = ogword + "(meaning)"
    # if ogword is "gato" and meaning is "cat",
    # you want "gato (cat)"
    # but you will get "gato(meaning)"
    subword = ogword + " (" + meaning + ")"
    bookin = bookin.replace(ogword, subword)
    ogword = ogword.capitalize()
    subword = ogword + "(meaning)"  # fix this also
    bookin.replace(ogword, subword) # fix this also
    print len(bookin) # help debug your MemoryError
subin.close()
bookout = open("output.txt", "w")
bookout.write(bookin)
bookout.close()

You need to follow the advice of @Levon and try your code on some small test data files so that you can see what is happening.

After using this one-line dictionary:

gato cat

with this one-line novel:

El gato se sirvió un poco de Gatorade para el "alligator".

you may wish to reconsider your high-level strategy.

You can get this information when typing these in the interpreter:

>>> help(str.replace)  
>>> help('a'.replace)  
>>> s = 'a'  
>>> help(s.replace)  
>>> import string  
>>> help(string.replace)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top