Question

I want to search a a file with sentences, and get the sentences with certain words out. I wrote this code to do that.

def finding(q):
    for item in sentences:
        if item.lower().find(q.lower()) != -1:
            list.append(item)

        for sentence in list:
            outfile.write(sentence+'\r\n')

finding('apple')
finding('banana')

The problem is that this finds substrings instead of words. So for example the sentence'the appletree is big'. Would also get extracted.

Était-ce utile?

La solution

Split the line into words; the simplest is to use str.split():

for line in sentences:
    if any(q.lower() == word.lower() for word in line.split()):
        outfile.write(line + '\n')

You can add a .strip('?!."()') to remove most common punctuation as well, perhaps.

Note that Python files, opened in text mode, will already use \r\n on Windows if you write out a \n. The code above also directly writes the matched lines to the output file.

Alternatively, use a regular expression to find matches:

import re

def finding(q, sentences, outfile):
    pattern = re.compile(r'\b{}\b'.format(re.escape(q), flags=re.IGNORE)
    for line in sentences:
        if pattern.match(line)
            outfile.write(line + '\n')

re.IGNORE makes the match ignore case, \b adds word boundaries and re.escape() removes any expression metacharacters from the input query.

Autres conseils

An alternative:

sentences = [
    'this has a banana',
    'this one does not',
    'bananatree should not be here',
    'go go banana go'
]

import re
found = filter(re.compile(r'\bbanana\b', flags=re.I).search, sentences)
# ['this has a banana', 'go go banana go']
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top