Question

Given bigram probabilities for words in a text, how would one compute trigram probabilities?

For example, if we know that P(dog cat) = 0.3 and P(cat mouse) = 0.2

how do we find the probability of P(dog cat mouse)?

Thank you!

Was it helpful?

Solution

In the following I consider a trigram as three random variables A,B,C. So dog cat horse would be A=dog, B=cat, C=horse.

Using the chain rule: P(A,B,C) = P(A,B) * P(C|A,B). Now your stuck if you want to stay exact.

What you can do is assuming C is independent of A given B. Then it holds that P(C|A,B) = P(C|B). And P(C|B) = P(C,B) / P(B), which you should be able to compute from your trigram frequencies. Note that in your case P(C|B) should really be the probability of C following a B, so it's the probability of a BC divided by the probability of a B*.

So to sum it up, when using the conditional independence assumption:

P(ABC) = P(AB) * P(BC) / P(B*)

And to compute P(B*) you have to sum up the probabilities for all trigrams beginning with B.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top