Speech recognition, Such as Siri

https://stackoverflow.com/questions/11505345

21-06-2021
|

Question

Softwares such as Siri, takes voice command and responds to those questions appropriately(98%). I wanted to know that when we write a software to take input stream of voice signal and to respond to those questions,

Do we need to convert the input into human readable language? such as English?

As in nature we have so many different languages but when we speak, we basically make different noise. That's it. However, we have created the so called alphabet to denote those noise variations.

So, Again my question is when we write speech recognition algorithms, Do we match those noise variation signals with our database or first we convert those noise variation into English and then check what to answer from database?

Solution

The "noise variation signals" you are referring to are called phonemes. How a speech recognition system translates these phonemes int words depends on the type of system. Siri is not a grammar based system where you tell the speech recognition system what types of phrases you are expecting based on a set of rules. Since Siri translates speech in an open context it probably uses some type of statistical modeling. A popular statistical model for speech recognition today is the Hidden Markov Model. While there is a database of sorts involved it is not a simple search of groups of phonemes into words. There is a pretty good high level description of the process and issues with translation here.

OTHER TIPS

Apple's Siri Based on Natural Language understanding.. I believe Nuance is behind the Scene.. Refer This Article
Nuance is Leader in Speech recognition system development. Accuracy of Nuance Dragon Engine is just Amazing... The Client whom i m working for is Consuming Nuance NOD service for their IVR system...
I have tried Nuance Dragon SDK for Android...

from my experience if you use Nuance you need not to worry about the noise variation etc etc...
But when you going for enterprise release of you application Nuance might be costly..

If you are planning to use Power of voice to drive your application Google API is also a better choice...

There are API's like Sphinx and pocket sphinx can also help you better for speech application development.. All the above API will take care of the noise rejection and Converting Speech into text etc etc..

all you need to worry is building your system to understand semantic meaning of the given String or recognized Speech content.. Apple should have very good semantic meaning interpreter. So give a try for Nuance SDK. it is available for Android ,iOS , Windows phone and HTTP Client Versions.

I hope it can help you

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow