Question

I need to Develop Speech Recognition application, to convert the telephone recorded audio to text, i am using Microsoft speech API 5.1.

I shouldn't use any choices, so i am using dictation grammar class. I didn't get the 100% accuracy.

Please help me to achieve the 100% accuracy, if there is any third party tools available to achieve means it is also welcome.

Development environment : windows xp ,.Net Framework 3.5,C#.

here my code:

 class Program
 {
    static void Main(string[] args)
    {
        SpeechRecognitionEngine recognizer =new SpeechRecognitionEngine()
        // Create and load a dictation grammar.
        recognizer.LoadGrammar(new DictationGrammar());
        recognizer.MaxAlternates = 5;

        // Add a handler for the speech recognized event.
        recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
        //Configure input to the speech recognizer.

        recognizer.SetInputToWaveFile("Record_210114090634.wav");

        // Start asynchronous, continuous speech recognition.
        recognizer.RecognizeAsync(RecognizeMode.Multiple);

   }   

   static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    {
        Console.WriteLine(e.Result.Text);
    }
}

No correct solution

OTHER TIPS

The dictation recognition engine in SAPI 5.1 simply can't recognize telephony quality (8 KHz 8 bit) audio, no matter what you do. It requires 22KHz 16 bit audio for any sort of reasonable error rate.

Even with that, the SAPI 5.1 dictation engine also requires speaker-dependent training, even for unaccented American English.

Even the latest SAPI 5.4 SR engines require 22KHz 16 bit audio, although they do work better without training for unaccented voices. (Accented voices still require training.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top