Question

I want to create a voice controlled slideshow using MIT's WAMI API (or another speech recognition API) and impress.js. I want to include simple commands like "next page" or "go back".

Would it be possible? How could I do it?

Was it helpful?

Solution

There is another post in stackoverflow that asks a very similar question but they wanted to use Google's Speech Recognition API. There is a pretty good answer to this question there.

There is also a new Speech API in Chrome that could be used. The problem with this solution is that you have to click on an icon to tell the speech recognition engine (ASR) to start listening, and your users are restricted to a specific version of Chrome. The way most of these solutions work is that you have to click on the icon to get the next utterance from the user. So once the ASR has a command it recognizes you have to click on the icon again to tell it to listen again. For an application that has a very limited command set (i.e. "next" and "back") there is not much value in this since it would be just as easy for the user to click on a button that tells the application to go forward or back.

It looks like the WAMI API lets you start the recognition process programmatically which is a better alternative. This is a JavaScript API that you would just have to include in your web pages to start listening for user input. The documentation for this API provides good examples on how to develop a multimodal speech recognition application. You will need to learn how to develop grammars that specify to the speech engine what utterances your are looking for in your application. WAMI uses the JSpeech Grammar Format. Once you get a recognition of either "next" or "back" from the ASR you would just move to the next or previous slide using JavaScript.

OTHER TIPS

I would use the SpeechRecognition API in the browser.

For an easy way to do this with JavaScript, check out annyang, which is a library that makes dealing with speech recognition super-easy.

You can try SpeechAPI built with flash and sphinx4 http://cmusphinx.sourceforge.net and which allows you to recognize from javascript in browswer. You can find the demos and the stuff here:

http://speechapi.com/

You can install your own speech recognition server to work with flash using the server from speech api sourceforge project

http://sourceforge.net/projects/speechcloud/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top