Question

I am about to tackle a programming problem about the Levenshtein Distance. And according to the definition given on my sheet, it states that the Lenveshtein distances is equal to the number of substitutions, insertions and deletions between two strings. However wouldn't a subsitution just be a deletion and then an insertion? What am I missing here?

Was it helpful?

Solution

You can achieve the effect of a substitution using an insertion plus a deletion, yes. But if you limit yourself to insertions and deletions only, each such "substitution" you create this way will cost you 2 instead of 1. That may be realistic for some applications, but sometimes it's more plausible to suppose that a substitution costs the same/is as likely as an insertion or a deletion, rather than twice as costly/half as likely.

[EDIT]

In fact it's possible and sometimes useful to invent much more general edit distances than the standard Levenshtein distance. You can give any arbitrary costs to insertions, deletions and substitutions. You can even extend the set of operations to also include transpositions (changing ab to ba for some fixed cost) or block operations ("insert a copy of the length-j substring starting at position i" for some fixed cost). The effect of a transposition is of course achievable without a special "transposition" move using a deletion plus an insertion, but again this results in the move costing more than either a deletion or an insertion alone. If your application is that you want to find the English word that a person most likely "meant" when they type a word that's not in the dictionary, you would probably prefer to use a distance where a transposition has low cost, because this is a common typing error.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top