Frage

Let's say that we have two examples:

one:

a = "Six d.o.g.s."
b = "six d.o.g.s"

two:

c = "Death Disco"
d = "deathdisco"
e = "deathdisco666"

Both are slightly different. The first has one more dot and the second no space in between on the. Some are lowercase.


Objective:

  • For the given a and b we want a.lower()==b.lower() to give true if they have two letters "error".

  • For the c and d to give true since "error" is only one space.

  • But for the c and e, although the e is two more letters in length (comparing with c) we have three letters different.

How can I do this with python? Via regex or is there a library for similar purpose?

War es hilfreich?

Lösung

So given minitech's comment I write the code I found:

def levenshtein(seq1, seq2):
    oneago = None
    thisrow = range(1, len(seq2) + 1) + [0]
    for x in xrange(len(seq1)):
        twoago, oneago, thisrow = oneago, thisrow, [0] * len(seq2) + [x + 1]
        for y in xrange(len(seq2)):
            delcost = oneago[y] + 1
            addcost = thisrow[y - 1] + 1
            subcost = oneago[y - 1] + (seq1[x] != seq2[y])
            thisrow[y] = min(delcost, addcost, subcost)
    return thisrow[len(seq2) - 1]


print levenshtein(a,b) < 2
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top