Are Speech-to-Text libraries/engines only functional when the speaker is directly before the mic?

https://stackoverflow.com/questions/23140343

05-07-2023
|

Question

Another way of asking the question, I reckon, is, "Are mics on Android phones/phablets/tablets unidirectional or omnidirectional?"

I'm wondering if a Speech-to-Text app could pick up multiple speakers in a conversation, or is it functionally limited to somebody speaking right into the mic, as when IPhoniacs ask "Siri" questions, or when a youngster practices his rendition of the Gettysburg Address with device in hand?

Solution

I'm wondering if a Speech-to-Text app could pick up multiple speakers in a conversation

Speech-to-text could pick multiple speakers but the issue is that microphones on smartphones are pretty limited and specifically tuned to cancel surrounding noise and surrounding speech. And there is no control on that in API. Phones are good at recording single speaker only.

Apps like this are announced:

http://www.gridspace.com/memo-mobile

but I seriously doubt they will be delivered.

or is it functionally limited to somebody speaking right into the mic, as when IPhoniacs ask "Siri" questions, or when a youngster practices his rendition of the Gettysburg Address with device in hand?

It is possible to run software for speaker identification on the phone, so it might identify owner and ignore others if that what you are looking for. It might learn few speakers around too.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow