Question

I'm wondering whether there's an efficient data structure to perform "Retrieve all strings with levenshtein distance less than X".

Few things I'm interested in:

  • Explanation of the algorithm.
  • Is there an existing implementation in existing database / programming langauge?
  • Paper / article that I can refer to?
Was it helpful?

Solution

this is nearest neighborer search in a metric space with levenshtein distance as the metric (or distance) function

a VP-tree is one of the ways of solving that problem

this Python VP-tree implementation is a working demo that shows how a VP-tree works run it on say a word list it provides an interactive shell where you type a word and it returns the words in that list that are no more then X distance from the word you typed

OTHER TIPS

Sounds like a simple breadth-first search with each generation being just one 'edit' away from the previous - and with checks in place to ensure that a string appears in one-and-only-one level.

This would be easily implemented using a couple of hashsets / hashtables in a pair of loops.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top