문제

I am using SequenceMatcher to find a set of words within a group of texts. The problem I am having is that I need to record when it does not find a match, but one time per text. If I try an if statement, it gives me a result each time the comparison to another word fails.

names=[JOHN, LARRY, PETER, MARY]
files = [path or link]

  for file in files: 
     for name in names:
        if SequenceMatcher(None, name, file).ratio() > .9:
             do something
        else:
             print name + 'not found'

I have also tried re.match and re.find and I encounter the same problem. The code above is a simple version of what I am doing. I'm new to Python too. Thank you very much!

도움이 되었습니까?

해결책

If I interpret your comment to the question correctly (but I am not 100% sure!), this might illustrate the general mechanism you can follow:

>>> text = 'If JOHN would be married to PETER, then MARY would probably be unhappy'
>>> names = ['JOHN', 'LARRY', 'PETER', 'MARY']
>>> [text.find(name) for name in names]
[3, -1, 28, 40]  #This list will be always long as the names list

What I mean by "mechanism you can follow" is that SequenceMatcher (that I substituted with the builtin method find) should not only work as a test [True|False] but should already output the information you want to store.

HTH!

다른 팁

The simple way would be to keep track of matched names and not print them if they've already been printed:

seen = {}
for file in files:
    for name in names:
        if SequenceMatcher(None, name, file).ratio() > .9:
            do something
        elif name not in seen:
            seen[name] = 0
            print name + 'not found'
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top