Question

I'm constructing an Hidden Markov Model to identify whether someone is saying either "Yes" or "No". I have developed the Hidden Markov Model and I have come across a tutorial from this page:

http://www.cslu.ogi.edu/tutordemos/nnet_recog/recog.html

And in this tutorial it says:

This figure traces the search paths for "yes" and "no" through a hypothetical matrix of probabilities. Even though the score for "no" is very low, it is still possible to find the most probable path for this word, if "yes" had not been in our vocabulary. The Viterbi search can be understood by reading through the following pseudo-code algorithm (with notation borrowed from Rabiner's paper, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition):

I have read through both of the papers and I am still confused by where they say:

through a hypothetical matrix of probabilities

My questions is where does this Matrix of probabilities come from? For example, I have done the follow:

  • Read in the Audio File
  • Stripped the Audio signals that do not warrant consideration
  • Split the signals that warrent consideration into blocks

This means that I am left with blocks that contain the Phonemes. I have computed the Zero-crossings of the data, and, thus brings me to my point:

For "No" the data from this is very low,

For "Yes" the data from this is very high.

So in the example (given above) it says:

Even though the score for "no" is very low,

So could I just pass in the results from the zero-crossings as my probabilities? I'm confused and hope someone can help me with this.

Was it helpful?

Solution

In a philosophical sense, this matrix of probabilities comes from nature. More seriously, this matrix represents the idea of the Transition Matrix, which can be computed by Baum Welch on sampled data, if one does not "know" nature's true distribution (no one does). That's why they say it is hypothetical.

With respect to your second question, you need to get the Transition matrix (the probabilities) by applying Baum Welch to your zero crossing samples (I'm not sure what zero-crossing samples are, usually mfcc is used for this sort of thing).

Let me know if more clarification is required or I am misunderstanding something.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top