Hidden Markov Models - Identifying Phonemes

https://stackoverflow.com/questions/13418472

29-11-2021
|

Pregunta

I'm developing a project that identifies Phonemes to be able to identify whether someone is saying either "Yes" or "No".

So far in the project, I have used Zero-crossings to identify what the person is saying, this works really well and seems simple enough to understand. The project, however, needs a few enhancements and has to be developed using a Hidden Markov Model.

My question is this:

I want to develop a Hidden Markov Model, without erasing the work that I have already completed. I.e. I strip the data that do not warrant consideration by counting the number of zero-crossings as well as the summation of the blocks.

I do not understand what data I would need to train the HMM in order to be able to identify these Phonemes. E.g.

With Zero-crossings I have identifies that:

Yes - Zero-crossings start low and then the value increases

No - Zero-crossings start low and then do not increase with value.

Could I train my HMM algorithm so that it interprets these values?

Or could anyone suggest a method of which I can train the HMM to be able to identify the word that is inputted in the sample?

Hope someone can help :)!

Solución

Could I train my HMM algorithm so that it interprets these values?

Yes, definitely

Or could anyone suggest a method of which I can train the HMM to be able to identify the word that is inputted in the sample?

You just need to put zero crossing rate in a feature file together with MFCC features like 14th feature and use any standard HMM training toolkit like CMUSphinx or HTK to train the HMM and decode using it. For more information see

http://cmusphinx.sourceforge.net/wiki/mfcformat

http://speech-research.com/htkSearch/index.php?ID=297039

http://speech-research.com/SRTxt2User/index.html

Otros consejos

Automated phoneme segmentation is a tough problem, so I'll provide some of my favored resources that touch on the topic in various levels of detail.

This paper: http://www.seas.upenn.edu/~jan/Files/Iscas99Speech.pdf

This paper: http://www.ll.mit.edu/publications/journal/pdf/vol08_no2/8.2.1.languageidentification.pdf

This resource is very good: http://research.microsoft.com/pubs/118769/Book-Chap-HuangDeng2010.pdf

This book gives some good examples for phoneme identification: http://www.amazon.com/Speech-Recognition-Theory-C-Implementation/dp/0471977306/

This book is pretty good, too: http://www.amazon.com/Statistical-Methods-Recognition-Language-Communication/dp/0262100665/

The books are expensive, but they are worth it (in my opinion)

Licenciado bajo: CC-BY-SA con atribución