Speech Recognition using CMU Shinx, JSAPI and Google Speech API

https://stackoverflow.com/questions/8664726

08-04-2021
|

Pregunta

Speech recognition is one of the many features of my current project which will be most probably developed in J2EE (other languages are also welcomed if their choice is justified).

Most of the links at google and on SO suggest the above mentioned three options, Sphinx 4, JSAPI directly and Google Speech API (making a server call to google and than getting the result as text).

What are the other freely available options for me ? And If I use Sphinx-4 how do I get the language model for general English to be used with it ?

Solución

Yes, there are.

It is possible to use a wrapper to Google Speech Recognizer that is basic a line of code. You send speech audio in FLAC or SPEEX format and receive recognition and a confidence score. The only problem is that Google can close API as did with Google translate.
Other option is to use Sphinx (Sphinx4 or Pocketsphinx).
It is possible to use HTK (http://htk.eng.cam.ac.uk/) and use HVite (HTK decoder) or other like Julius (http://julius.sourceforge.jp/en/). There are other options that use HTK to train acoustic models and/or language and grammar.

Voxforge has acoustic and language models for HTK and Sphinx (http://voxforge.org/).

Otros consejos

And If I use Sphinx-4 how do I get the language model for general English to be used with it ?

You can download them from CMUSphinx website and from other places. You can also build them yourself. One of the possible locations are

http://www.keithv.com/software/csr/

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow