SequenceMatcher: Recording no match just once?
-
06-03-2021 - |
문제
I am using SequenceMatcher
to find a set of words within a group of texts. The problem I am having is that I need to record when it does not find a match, but one time per text. If I try an if statement, it gives me a result each time the comparison to another word fails.
names=[JOHN, LARRY, PETER, MARY]
files = [path or link]
for file in files:
for name in names:
if SequenceMatcher(None, name, file).ratio() > .9:
do something
else:
print name + 'not found'
I have also tried re.match
and re.find
and I encounter the same problem.
The code above is a simple version of what I am doing. I'm new to Python too.
Thank you very much!
해결책
If I interpret your comment to the question correctly (but I am not 100% sure!), this might illustrate the general mechanism you can follow:
>>> text = 'If JOHN would be married to PETER, then MARY would probably be unhappy'
>>> names = ['JOHN', 'LARRY', 'PETER', 'MARY']
>>> [text.find(name) for name in names]
[3, -1, 28, 40] #This list will be always long as the names list
What I mean by "mechanism you can follow" is that SequenceMatcher
(that I substituted with the builtin method find
) should not only work as a test [True|False] but should already output the information you want to store.
HTH!
다른 팁
The simple way would be to keep track of matched names and not print them if they've already been printed:
seen = {}
for file in files:
for name in names:
if SequenceMatcher(None, name, file).ratio() > .9:
do something
elif name not in seen:
seen[name] = 0
print name + 'not found'