Internetless vocal trigger recognition

https://stackoverflow.com/questions/14762100

07-03-2022
|

Question

Speech recognition on handheld devices is usually triggered by a press of a button. How do I go about triggering speech recognition without that? My Raspberry PI based device intentionally does not have anything users are able to interact with manually - there is only a microphone hanging out the wall.

I am trying to implement a way to have it understand a simple trigger command that would initiate a sequence of actions. In short, I want to run a single .sh script whenever it "hears" an audio trigger. I don't want it to understand anything else but just a trigger - there is no meaning that it has to decode from the trigger itself - like the name of the script or parameters. A very simple function - "hear the trigger -> execute .sh script"

I've explored different options:

Getting audio stream continuously sent to google speech recognition service - Not a very good idea - too much wasted traffic and resources
Getting an internetless speech recognition application to continuously listen to the audio stream and "pick out" the trigger words - that's a bit better yet pretty much a waste of resources and these systems have to be taught audio samples - this pretty much removes ability to quickly set custom names to devices
Use some sort of pitch processing to have it react to a sequence of loud sounds - hands clapped two times or something similar - not too bad but I guess my hands will fall off after I get the thing properly tested or I will get killed by my family member since I normally get to experiment with my toys at night when they are in beds.
Whistle recognition - not much different from previous option but your palms don't sore and chances are I survive the testing if I learn to whistle quietly. I was able to find an article by IBM on commanding a computer via whistle commands - the approach is pretty much the same to local speech recognition applications but you teach it to understand the different whistle sequences. However, from that I did not understand how I could teach it to understand just any whistle regardless of its tone.

I sort of like the whistle idea - it does seem that it should be the least resources hungry amongst other options - how can I do this?

Are there other vocal triggers that could be easily implemented given I am limited by Raspberry PI hardware?

Solution

Take a look on nodejs process which handles audio stream events from microphone and then uses pocketsphinx offline voice recognition with limited custom dictionary to recognize simple voice commands:

https://github.com/ybutb/yee-voice

OTHER TIPS

Mono is a framework you can install on the pi to allow you to compile and run C# applications, and I believe it has support for,System.Speech and System.Speech.Recognition. You can use these to easily write an app and just specify what words you want it to listen for. Write it on your computer and just move the exe to the pi and run it with a microphone hanging out of the pi. I did a similar application but I used a socket server and sent commands that way. The way you set commands is pretty simple.

    SpeechRecognitionEngine rec = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US")); 
    rec.SetInputToDefaultAudioDevice();
    rec.SpeechRecognized += speech_recognized;
    var c = new Choices();
    c.Add("Trigger");
    var gb = new GrammarBuilder(c);
    var g = new Grammar(gb);
    rec.LoadGrammar(g);
    rec.RecognizeAsync(RecognizeMode.Multiple);
    rec.MaxAlternates = 0;


    private void speech_recognized(object speechsender, SpeechRecognizedEventArgs e)
    {
    if(e.Result.Text == "Trigger"){
    //run your script
    }

}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow