There are a few things to say.
The model you are using to predict the most likely correction is a simple, cascaded probability model: There is a probability for
W
to be entered by the user, and a conditional probability for the misspellingX
to appear whenW
was meant. The correct terminology for P(X|W) is conditional probability, not likelihood. (A likelihood is used when estimating how well a candidate probability model matches given data. So it plays a role when you machine-learn a model, not when you apply a model to predict a correction.)If you were to use Levenshtein distance for P(X|W), you would get integers between 0 and the sum of the lengths of
W
andX
. This would not be suitable, because you are supposed to use a probability, which has to be between 0 and 1. Even worse, the value you get would be the larger the more different the candidate is from the input. That's the opposite of what you want.However, fortunately,
SequenceMatcher.ratio()
is not actually an implementation of Levenshtein distance. It's an implementation of a similarity measure and returns values between 0 and 1. The closer to 1, the more similar the two strings are. So this makes sense.Strictly speaking, you would have to verify that
SequenceMatcher.ratio()
is actually suitable as a probability measure. For this, you'd have to check if the sum of all ratios you get for all possible misspellings ofW
is a total of 1. This is certainly not the case withSequenceMatcher.ratio()
, so it is not in fact a mathematically valid choice.However, it will still give you reasonable results, and I'd say it can be used for a practical and prototypical implementation of a spell-checker. There is a perfomance concern, though: Since
SequenceMatcher.ratio()
is applied to a pair of strings (a candidateW
and the user inputX
), you might have to apply this to a huge number of possible candidates coming from the dictionary to select the best match. That will be very slow when your dictionary is large. To improve this, you'll need to implement your dictionary using a data structure that has approximate string search built into it. You may want to look at this existing post for inspiration (it's for Java, but the answers include suggestions of general algorithms).