Question

I am building a web app for recording voice messages and am looking for the best options for converting the voice messages to text. Does anyone have some suggestions on what to use to make the conversion? Would System.Speech work?

Was it helpful?

Solution

System.Speech is a client focused API. Vista and Windows 7 include the speech engines for System.Speech. You could use this for transcription because the client speech engines provided by Microsoft include a dictation grammar.

The server speech engines provided by Microsoft do not include a dictation grammar, so they are more difficult to use for transcription. The .NET namespace for server recognition is Microsoft.Speech and the complete SDK for the 10.2 version is available at http://www.microsoft.com/downloads/en/details.aspx?FamilyID=1b1604d3-4f66-4241-9a21-90a294a5c9a4. The speech engine is a free download.

To get started with .NET speech, there is a very good article that was published a few years ago at http://msdn.microsoft.com/en-us/magazine/cc163663.aspx. It is probably the best introductory article I’ve found so far. It is a little out of date, but very helfpul. (The AppendResultKeyValue method was dropped after the beta.)

Here is a quick sample that shows one of the simplest .NET windows forms app to use a dictation grammar that I could think of. This should work on Windows Vista or Windows 7. I created a form. Dropped a button on it and made the button big. Added a reference to System.Speech and the line:

using System.Speech.Recognition;

Then I added the following event handler to button1:

private void button1_Click(object sender, EventArgs e)
{         
    SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine();
    Grammar dictationGrammar = new DictationGrammar();
    recognizer.LoadGrammar(dictationGrammar);
    try
    {
        button1.Text = "Speak Now";
        recognizer.SetInputToDefaultAudioDevice();
        RecognitionResult result = recognizer.Recognize();
        button1.Text = result.Text;
    }
    catch (InvalidOperationException exception)
    {
        button1.Text = String.Format("Could not recognize input from default aduio device. Is a microphone or sound card available?\r\n{0} - {1}.", exception.Source, exception.Message);
    }
    finally
    {
        recognizer.UnloadAllGrammars();
    }                          
}

A little more information comparing the various flavors of speech engines and APIs shipped by Microsoft can be found at What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top