Using System.Speech to convert mp3 file to text

https://stackoverflow.com/questions/17895933

04-06-2022
|

Question

I'm trying to use the speech recognition in .net to recognize the speech of a podcast in an mp3 file and get the result as string. All the examples I've seen are related to using microphone but I don't want to use the microphone and provide a sample mp3 file as my audio source. Can anyone point me to any resource or post an example.

EDIT -

I converted the audio file to wav file and tried this code on it. But it only extracts the first 68 words.

public class MyRecognizer {
    public string ReadAudio() {
        SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
        Grammar gr = new DictationGrammar();
        sre.LoadGrammar(gr);
        sre.SetInputToWaveFile("C:\\Users\\Soham Dasgupta\\Downloads\\Podcasts\\Engadget_Podcast_353.wav");
        sre.BabbleTimeout = new TimeSpan(Int32.MaxValue);
        sre.InitialSilenceTimeout = new TimeSpan(Int32.MaxValue);
        sre.EndSilenceTimeout = new TimeSpan(100000000);
        sre.EndSilenceTimeoutAmbiguous = new TimeSpan(100000000);
        RecognitionResult result = sre.Recognize(new TimeSpan(Int32.MaxValue));
        return result.Text;
    }
}

Solution

Try reading it in a loop.

SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
Grammar gr = new DictationGrammar();
sre.LoadGrammar(gr);
sre.SetInputToWaveFile("C:\\Users\\Soham Dasgupta\\Downloads\\Podcasts\\Engadget_Podcast_353.wav");
sre.BabbleTimeout = new TimeSpan(Int32.MaxValue);
sre.InitialSilenceTimeout = new TimeSpan(Int32.MaxValue);
sre.EndSilenceTimeout = new TimeSpan(100000000);
sre.EndSilenceTimeoutAmbiguous = new TimeSpan(100000000); 

StringBuilder sb = new StringBuilder();
while (true)
{
    try
    {
        var recText = sre.Recognize();
        if (recText == null)
        {               
            break;
        }

        sb.Append(recText.Text);
    }
    catch (Exception ex)
    {   
        //handle exception      
        //...

        break;
    }
}
return sb.ToString();

If you've a Windows Forms or WPF application, run this code in a seperate thread, otherwise it blocks the UI thread.

OTHER TIPS

I would look first at the method documented here: http://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognitionengine.setinputtowavefile.aspx

You should be able to work it out from here I think.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow