Question

If we have three string a, b, c and we know ( or already calculated ) edit_distance(a,b) and edit_distance(b,c), can we efficiently calculate edit_distance(a,c) without actually comparing a and c.

*edit_distance(a,b) = number of character insertion, deletion and replacement required to convert a into b.*

Was it helpful?

Solution

In general, no. For example, take

  • a = CAP
  • b = CAT
  • c = CAR

Here, edit_distance(a, b) = 1 and edit_distance(b, c) = 1. Moreover, edit_distance(a, c) = 1.

However, we could also have

  • a = CAP
  • b = CAT
  • c = RAT

Here, edit_distance(a, b) = 1 and edit_distance(b, c) = 1, but edit_distance(a, c) = 2. Therefore, there is no way to purely use the edit distances of a and b and of b and c to compute the edit distance of a and c.

However, we do know that edit_distance(a, c) ≤ edit_distance(a, b) + edit_distance(b, c), since you can always apply the transformations in sequence to turn a into c. More generally, edit distance forms a discrete distance metric, which forms the basis of the BK-tree data structure.

Hope this helps!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top