Question

I've managed to finally build and run pocketsphinx (pocketsphinx_continuous). The problem I'm running into, is how to a improve accuracy. From what I understand, you can specify a dictionary file (-dict test.dic). So I took the default dictionary file and added some more pronunciations of the same words, for example:

pencil P EH N S AH L
pencil(2) P EH N S IH L

spaghetti S P AH G EH T IY
spaghetti(2) S P UH G EH T IY

Yet pocketsphinx still does not recognize either word at all. I know there is a jsgf file you can specify as well , but that seems more for phrases and grammar. How can I get pocketsphinx to recognize common words such as pencil and spaghetti?

thanks

-Mike

Was it helpful?

Solution

With something like this, you can't be certain, but I can offer the following suggestions:

  1. Perhaps the language model somehow has low probabilities for "spaghetti" and "pencil". As you suggested, you could use a JSGF to test out how it does for recognition if it doesn't use the N-gram models, but instead does a simple grammar (give it like twenty words, including spaghetti and pencil). This way you can see if it is perhaps the language model which makes it difficult to recognize these words, and it can do okay if it considers all the words to have equal probability.

  2. Perhaps you simply pronounce these words poorly, even with the alternative dictionary entries. Try either A. Testing other peoples' voices, or B. Adapting the acoustic model to your voice (see http://cmusphinx.sourceforge.net/wiki/tutorialam)

  3. Also, what is it recognizing them as when it is failing? If possible, remove the words it misrecognizes as from the dictionary.

Again, for overall accuracy, only three things are going to really help you: restricting the grammar, adapting the accoustic model, and perhaps getting higher quality recording input.

OTHER TIPS

To improve accuracy you may want to try adapting the acoustic model to your voice. http://cmusphinx.sourceforge.net/wiki/tutorialadapt

To learn how to add new words: http://ghatage.com/tech/2012/12/13/Make-Pocketsphinx-recognize-new-words/

Make sure you put a tab (not a space) after the word and before the start of the pronunciation.

May be the problem is with Pocketsphinx. I too was not getting good results with Pocketsphinx. But I was getting very good accuracy with Sphinx4 (for a US speaker with a noise-cancelling microphone.) Therefore I did a comparison between the two using the same audio recordings. For pocketsphinx I used pocketsphinx_batch with the WSJ audio model and a small vocabulary language model and dictionary (created online with the CMU Cambridge language modelling toolkit.) For Sphinx4 I wrote a small Java program using the Sphinx4 library. The result was that Sphinx4 was much more accurate. All the gory details are at http://www.jaivox.com/pocketsphinx.html.

To achieve good accuracy with a pocketshinx:

  • Important! Check that your mic, audio device, file supports 16 kHz while the general model is trained with 16 kHz acoustic examples.
  • You should create your own limited dictionary you cannot use cmusphinx-voxforge-de.dic while accuracy is dramatically dropped.
  • You should create your own language model.

You can search for Jasper project on GitLab to see how it's implemented. Also, please check the documentation

This is on the CMUSphinx website

"There are various phonesets to represent phones, such as IPA or SAMPA. CMUSphinx does not yet require you to use any well-known phoneset, moreover, it prefers to use letter-only phone names without special symbols. This requirement simplifies some processing algorithms, for example, you can create files with phone names as part of the filenames without any violating of the OS filename requirements.

A dictionary should contain all the words you are interested in, otherwise the recognizer will not be able to recognize them. However, it is not sufficient to have the words in the dictionary. The recognizer looks for a word in both the dictionary and the language model. Without the language model, a word will not be recognized, even if it is present in the dictionary." https://cmusphinx.github.io/wiki/tutorialdict/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top