Question

I am learning about Hidden Markov Models for classifying motion in a sequence of t image frames.

Assume I have m dimensions of feature from each frame. Then I cluster it into a symbol (for observable symbol). And I create k different HMM model for k class.

Then, how do I determine the number of hidden states for each model to optimise prediction ?

Btw, is my approach correct? If I misunderstood how to use it, please correct me:)

Thanks :)

Was it helpful?

Solution

"is my approach already correct?"

Your current approach is correct. I have done the same thing some weeks ago and had asked the same questions. I have built a gesture recognition tool.

You say you have k classes you want to recognize, so yes, you will train k HMM. For each HMM you run Forward algorithm and receive P(HMM|observation) for each hidden markov model (alternatively Viterbi decoding is also possible). Then you take the one with the highest probability.

It's also correct to see the m-dimensional feature vector as a single observation symbol. Depending on what your vector looks like, you might want to use a continuous hidden markov model or a discrete hidden markov model. Working with discrete ones is often easier and easier to train with little training data. So in case your feature vector space is continuous, you might want to consider discretization to make all values discrete (e.g. through uniform classes). The question about discreteness is: How many classes of observations will you have?

"how to determine the number of hidden state for each model to get optimal prediction?"

However, I cannot fully answer your actual question about the number of hidden states. From what I have been taught in other areas, it seems like it's a lot of benchmarking and testing. E.g. in speech recongition we use 3 HMM states for each phonem (human sound), because sounds sound different at the beginning, in the middle and at the end. And then each different phonem gets one triple. But that was of course engineering.

In my own application I have thought like this: I wanted to define gestures and associate them with directions. Like open_firefox = [UP, RIGHT]. So I decided to use four hidden states for all four directions. I guess finding out the best number of states is a lot about engineering and trying out different things.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top