In the following I consider a trigram as three random variables A,B,C
. So dog cat horse
would be A=dog, B=cat, C=horse
.
Using the chain rule: P(A,B,C) = P(A,B) * P(C|A,B)
. Now your stuck if you want to stay exact.
What you can do is assuming C
is independent of A
given B
. Then it holds that P(C|A,B) = P(C|B)
. And P(C|B) = P(C,B) / P(B)
, which you should be able to compute from your trigram frequencies. Note that in your case P(C|B)
should really be the probability of C
following a B
, so it's the probability of a BC
divided by the probability of a B*
.
So to sum it up, when using the conditional independence assumption:
P(ABC) = P(AB) * P(BC) / P(B*)
And to compute P(B*)
you have to sum up the probabilities for all trigrams beginning with B
.