Observation sequences format for HMM in speech recognition

https://stackoverflow.com/questions/16868014

30-05-2022
|

Question

I am trying to develop a system to seperate garbage from non-garbage in speech recognition. I am using the jahmm implementation of the Hidden Markov Models. I'm confused about the format I should provide the training data to the system as the observation sequence. And what is each state in the HMM composed of? I tried reading the manual but couldnt understand. Thank you

Solution

I'm confused about the format I should provide the training data to the system as the observation sequence.

To understand the format you can just read sources

public <O extends Observation> Hmm<O>
    learn(Hmm<O> initialHmm, List<? extends List<? extends O>> sequences)

suggests that the input data must be a list of observation sequences. Each observation sequence is a list of observations. If you don't understand what list is, a good introduction into CS can help you

And what is each state in the HMM composed of?

States of HMM are just elements of mathematical structure. They are not composed of anything. They have a probability distribution associated with them. You can find more details in HMM tutorial which you should better read before you start work on HMM.

The library itself also describes everything well in documentation:

http://jahmm.googlecode.com/svn/javadoc/0.6.2/index.html

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow