Question

I'm sure you've all heard of the "Word game", where you try to change one word to another by changing one letter at a time, and only going through valid English words. I'm trying to implement an A* Algorithm to solve it (just to flesh out my understanding of A*) and one of the things that is needed is a minimum-distance heuristic.

That is, the minimum number of one of these three mutations that can turn an arbitrary string a into another string b: 1) Change one letter for another 2) Add one letter at a spot before or after any letter 3) Remove any letter

Examples

aabca => abaca:
aabca
abca
abaca
= 2

abcdebf => bgabf:
abcdebf
bcdebf
bcdbf
bgdbf
bgabf
= 4

I've tried many algorithms out; I can't seem to find one that gives the actual answer every time. In fact, sometimes I'm not sure if even my human reasoning is finding the best answer.

Does anyone know any algorithm for such purpose? Or maybe can help me find one?

(Just to clarify, I'm asking for an algorithm that can turn any arbitrary string to any other, disregarding their English validity-ness.)

Was it helpful?

Solution

You want the minimum edit distance (or Levenshtein distance):

The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance in 1965.

And one algorithm to determine the editing sequence is on the same page here.

OTHER TIPS

An excellent reference on "Edit distance" is section 6.3 of the Algorithms textbook by S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani, a draft of which is available freely here.

If you have a reasonably sized (small) dictionary, a breadth first tree search might work.

So start with all words your word can mutate into, then all those can mutate into (except the original), then go down to the third level... Until you find the word you are looking for.

You could eliminate divergent words (ones further away from the target), but doing so might cause you to fail in a case where you must go through some divergent state to reach the shortest path.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top