How to find all strings at a given edit distance from a given string

https://stackoverflow.com/questions/12886997

07-07-2021
|

문제

We all have seen in Google, that if we type a query, and make a typo, Google suggests a saner version of the query (which is correct more often than not). Now how do they do it? One possible way I can think of is find out all other strings at an edit distance of 1 from the given string, and if any on of them returns a string with a higher value 'searched` attribute (might come from back-end DB, where each indexed query term has a weight associated with it based on how frequently that term crops up in queries) than the given string, that string is suggested. If none are found, then strings with an edit distance of 2 are searched, and so on, until, say at 5, the SE decides that may be this string is the one the user is looking for, and returns the corresponding search results.

Now is it possible at all to find strings at a given edit distance from a given string? How efficient would that be for this process? Is there any cool algorithm to do this?

해결책

There is interesting article of Peter Norvig "How to Write a Spelling Corrector" talking about how "Do you mean" might work

다른 팁

This will of course be speculations, but sure, Google has a vast statistical foundation to do a guess of the correct word. Context can be another factor which depends on the other words.

So my guess is that the algorithm they use first determines probable context based on all words, and then statistically looking up typos variants for the correct word in the given context. If no context (single word) they probably look up anything they could be similar.

In addition, as Google uses a MySQL based database, they could also use the SOUNDS LIKE feature which list words that would sound similar based on vocals.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow