Frage

I am confused about Levenshtein distance and triangle inequality. Wikipedia and other articles say that Levenshtein distance follows triangle inequality.

Triangle inequality states x+y>z, but for Levenshtein distance, it appears to me that x+y can be equal to z. For example, kitten-> sitting=3, kitten->sittin=2 and sittin->sitting=1. What am I missing here?

EDIT

The triangle inequality is not in eucledian space, but metric space. In a metric space, the triangle inequality is d(x,z)<= d(x,y)+d(y,z)

War es hilfreich?

Lösung

Triangle inequality states x+y>=z.

Andere Tipps

I would like to mention that for a lot of task is used the normalized Levenshtein distance or vice versa similarity. Like distance 2 between words with length 4 and 10 means more 50% and 80% similarity. The normalized Levenshtein distance doesn't satisfy triangle inequality in lot of cases. Therefore is not a metric from mathematical point of view.

However is possible to achieve triangle inequality with the correct normalization of the Levenshtein distance.

Let be Generic Levenshtein Distance (GLD) is normalized Levenshtein distance. The GLD is a metric valued in [0, 1]. Given two strings X and Y with over a finite alphabet with the length |X| and |Y|. The equation for normalization is following:

GLD(X,Y) = 2*d(X,Y) / (|X|,|Y| + d(X,Y))

where d(X,Y) is Levenshtein Distance.

This normalization satisfies triangle ineqaulity as results from the article [1].

I used also in my created Nuget Package - BlueSimilarity[2] common normalization which violates triangle inequality (1 - d(X,Y)/max(|X|,|Y|)).

Your Example

X = "kitten", Y = "sitting", Z = "sittin"

GLD(X,Z) <= GLD(X,Y)+ GLD(Y,Z)

2* d(X,Y) / (|X|,|Y| + d(X,Y)) + 2* d(Y,Z) / (|Y|+|Z| + d(Y,Z)) >= 2*d(X,Z) / (|X|,|Z| + d(X,Z))

23 /(6+7+3) + 1 - 21/(6+7+1) >= 2*2/(6+6+2)

0.375 + 0.143 >= 0.286

References:

[1] Li Yujian, Liu Bo; A Normalized Levenshtein distance metric; IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007

[2] Rozinek, Ondrej; BlueSimilarity; https://www.nuget.org/packages/BlueSimilarity/

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top