Using bi-grams, your answer will be as accurate as you can get with a second-order Markov chain. The results are surprisingly good for such a simple model, but of course you can do even better with more expressive models. For instance, in language modeling, Hidden Markov Models (HMMs) are very often used.
Calculating the probability of a string
-
03-06-2022 - |
Question
I want to calculate the probability of characters occurring in a string. For example given a string "test", I want to get P(test).
P(test) = p(t) * p(e|t) * p (s|te) * p(t|es)
I have calculated the various bi-gram frequencies of more than 100k strings and calculated the probabilities of their occurrence. My question is, by just multiplying the probabilities of n-grams in a string will I get an accurate answer or is there a better away for finding the same?
Any help is highly appreciated.
Solution
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow