Applying Hidden Markov Models in Python

Question

One benefit of Hidden Markov Models is that you can generally do what you need without considering all possible paths one by one. What you are trying to do looks like an expensive way of finding the single most probable path, which you can do by dynamic programming under the name of the Viterbi algorithm - see e.g. http://cs.brown.edu/research/ai/dynamics/tutorial/Documents/HiddenMarkovModels.html. There are other interesting things covered in documents like this which are not quite the same, such as working out the probabilities for the hidden state at a single position, or at all single positions. Very often this involves something called alpha and beta passes, which are a good search term, along with Hidden Markov Models.

There is a large description at http://en.wikipedia.org/wiki/Viterbi_algorithm with mathematical pseudo-code and what I think is python as well. Like most of these algorithms, it uses the Markov property that once you know the hidden state at a point you know everything you need to answer questions about that point in time - you don't need to know the past history. As in dynamic programming, you work from left to right along the data, using answers computed for output k-1 to work out answers for output k. What you want to work out at point k, for each state j, is the probability of the observed data up to and including that point, along the most likely path that ends up in state j at point k. That probability is the product of the probability of the observed data at k given state j, times the probability of the transition from some previous state at time k-1 to j times the probability of all of the observed data up to and including point k-1 given that you ended up at the previous state at time k-1 - this last bit is something you have just computed for time k-1. You consider all possible previous states and pick the one that gives you the highest combined probability. That gives you the answer for state j at time k, and you save the previous state that gave you the best answer. This may look like you are just fiddling around with outputs for k and k-1, but you now have an answer for time k that reflects all the data up to and including time k. You carry this on until k is the last point in your data, at which point you have answers for the probabilities of each final state given all the data. Pick the state at this time which gives you the highest probability and then trace all the way back using the info you saved about which previous state in time k-1 you used to compute the probability for data up to k and in state j at time k.