First of all, you don't pick the word with highest probability. You pick a random word, but no uniformly - with the probability in the model.
So, if you have 2 words in a model: "yes" and "no", and the probability distribution is 2/3 "yes", 1/3 "no", than the generated text may look like this:
yes no no yes yes no yes yes yes no yes yes yes
I.e., you'll have approximately 2/3 "yes" in the text and 1/3 "no".
EDIT
Here's a simple way to sample from the distribution:
- Generate a random number from 0 to 1.
- Iterate over all words in the model, summing their probability weights. As soon as the sum is larger than the generated number, emit the current word.
Here's an example:
Suppose you've generated 0.8
. You start from yes
and the accumulated probability weight will be 0.67
, so you take next word no
and get the accumulated weight 1.0
which is greater than 0.8
, so you emit no
.
Suppose next time you have 0.5
, then you need to emit yes
.