سؤال

I am using SequenceMatcher to find a set of words within a group of texts. The problem I am having is that I need to record when it does not find a match, but one time per text. If I try an if statement, it gives me a result each time the comparison to another word fails.

names=[JOHN, LARRY, PETER, MARY]
files = [path or link]

  for file in files: 
     for name in names:
        if SequenceMatcher(None, name, file).ratio() > .9:
             do something
        else:
             print name + 'not found'

I have also tried re.match and re.find and I encounter the same problem. The code above is a simple version of what I am doing. I'm new to Python too. Thank you very much!

هل كانت مفيدة؟

المحلول

If I interpret your comment to the question correctly (but I am not 100% sure!), this might illustrate the general mechanism you can follow:

>>> text = 'If JOHN would be married to PETER, then MARY would probably be unhappy'
>>> names = ['JOHN', 'LARRY', 'PETER', 'MARY']
>>> [text.find(name) for name in names]
[3, -1, 28, 40]  #This list will be always long as the names list

What I mean by "mechanism you can follow" is that SequenceMatcher (that I substituted with the builtin method find) should not only work as a test [True|False] but should already output the information you want to store.

HTH!

نصائح أخرى

The simple way would be to keep track of matched names and not print them if they've already been printed:

seen = {}
for file in files:
    for name in names:
        if SequenceMatcher(None, name, file).ratio() > .9:
            do something
        elif name not in seen:
            seen[name] = 0
            print name + 'not found'
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top