System.Speech.Recognition fails to accurately recognize continuous input

https://stackoverflow.com/questions/21540413

06-10-2022
|

Pregunta

I have a program recognizing speech quite well with System.Speech using SpeechRecognitionEngine. However, although accurate, it seems to throw away some audio input it receives. If I say, "one, two, three" with pauses between each word, it transcribes each work correctly. However, if I say them without a pause between each word, it will transcribe the first and sometimes the third word correctly. The second word is simply ignored.

Other people have had this problem, but I haven't been able to discovered their solutions. Microsoft Speech Recognition Speed

If I could I would like to set the recorder audio position to an earlier point in the audio stream but I haven't found a function in the API that would let me do this. Another approach I was considering was to have multiple recognition engines where each would attempt to take just one word and would be reused when it's finished handling that word but that's a very complex and resource hungry solution.

Any help on this problem would be appreciated.

I've cut it down to this piece of C# code:

public void Init()
{
    // Create an in-process speech recognizer for the en-US locale.
    var cultureInfo = new System.Globalization.CultureInfo("en-US");
    recognizer_ = new SpeechRecognitionEngine(cultureInfo);

    // Create and load a dictation grammar.
    var numbers = new Choices();
    numbers.Add(new string[] { "one", "two", "three" });

    // Create a GrammarBuilder object and append the Choices object.
    GrammarBuilder gb = new GrammarBuilder();
    gb.Append(numbers);
    var g = new Grammar(gb);
    recognizer_.LoadGrammar(g);

    // Add a handler for the speech recognized event.
    recognizer_.SpeechRecognized +=
        new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
    recognizer_.SpeechDetected += recognizer_SpeechDetected;

    // Configure input to the speech recognizer.
    recognizer_.SetInputToDefaultAudioDevice();

    // Start asynchronous, continuous speech recognition.
    recognizer_.RecognizeAsync(RecognizeMode.Multiple);
}

void recognizer_SpeechDetected(object sender, SpeechDetectedEventArgs e)
{
    Console.WriteLine("\nspeech detected event audio position:\t\t" + e.AudioPosition);
    Console.WriteLine("speech detected current audio position:\t\t" + recognizer_.AudioPosition);
    Console.WriteLine("speech detected recognizer audio position:\t" + recognizer_.RecognizerAudioPosition);
}

// Handle the SpeechRecognized event.
void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    Console.WriteLine("speech recognized event audio position:\t\t" + e.Result.Audio.AudioPosition);
    Console.WriteLine("speech recognized event audio start time: " + e.Result.Audio.StartTime);
    Console.WriteLine(e.Result.Text);

    // do things
    // ...
}

Solución

Instead of

gb.Append(numbers);

Which specifies to recognize isolated numbers try something like

gb.Append(new GrammarBuilder(numbers), 1, 5);

Which will allow to recognize number sequencies up to 5 numbers. Adjust repetition count according to your needs.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow