Question

I want to do stemming in a file. When I use it in terminal it works fine, but when I apply it in a text file, it does not work. Terminal code:

print PorterStemmer().stem_word('complications')

Function code:

def stemming_text_1():
    with open('test.txt', 'r') as f:
        text = f.read()
        print text
        singles = []    

        stemmer = PorterStemmer() #problem from HERE
        for plural in text:
            singles.append(stemmer.stem(plural))
        print singles

Input test.txt

126211 crashes bookmarks runs error logged debug core bookmarks
126262 manual change crashes bookmarks propagated ion view bookmarks

Desired/expected output

126211 crash bookmark runs error logged debug core bookmark
126262 manual change crash bookmark propagated ion view bookmark

Any suggestion will be greatly appreciated, thanks :)

Was it helpful?

Solution

You need to split the text into words for the stemmer to work. Currently, the variable text contains the whole file as one big string. The loop for plural in text: assigns each character in text to plural.

Try for plural in text.split(): instead.

[EDIT] To get the output in the format you want, you need to read the file line by line instead of reading it all at once:

def stemming_text_1():
    with open('test.txt', 'r') as f:
        for line in f:
            print line
            singles = []

            stemmer = PorterStemmer() #problem from HERE
            for plural in line.split():
                singles.append(stemmer.stem(plural))
            print ' '.join(singles)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top