Question

I want to search a a file with sentences, and get the sentences with certain words out. I wrote this code to do that.

def finding(q):
    for item in sentences:
        if item.lower().find(q.lower()) != -1:
            list.append(item)

        for sentence in list:
            outfile.write(sentence+'\r\n')

finding('apple')
finding('banana')

The problem is that this finds substrings instead of words. So for example the sentence'the appletree is big'. Would also get extracted.

Was it helpful?

Solution

Split the line into words; the simplest is to use str.split():

for line in sentences:
    if any(q.lower() == word.lower() for word in line.split()):
        outfile.write(line + '\n')

You can add a .strip('?!."()') to remove most common punctuation as well, perhaps.

Note that Python files, opened in text mode, will already use \r\n on Windows if you write out a \n. The code above also directly writes the matched lines to the output file.

Alternatively, use a regular expression to find matches:

import re

def finding(q, sentences, outfile):
    pattern = re.compile(r'\b{}\b'.format(re.escape(q), flags=re.IGNORE)
    for line in sentences:
        if pattern.match(line)
            outfile.write(line + '\n')

re.IGNORE makes the match ignore case, \b adds word boundaries and re.escape() removes any expression metacharacters from the input query.

OTHER TIPS

An alternative:

sentences = [
    'this has a banana',
    'this one does not',
    'bananatree should not be here',
    'go go banana go'
]

import re
found = filter(re.compile(r'\bbanana\b', flags=re.I).search, sentences)
# ['this has a banana', 'go go banana go']
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top