The easiest way to achieve this is to use a special [start] token. You then know that this will always be the first token, and transitions from the [start] token to other words are learned in the model.
The stationary distribution of the Markov chain is the marginal distribution of $P$.