No response to a HTTP Get request in WebAPI in .NET 4.5 while using SpeechSynthesis for converting text to speech

https://stackoverflow.com/questions/15036942

11-03-2022
|

Question

I'm trying to setup a simple web service using WebAPI. Here is what I have for code:

public class SpeakController : ApiController
    {
        //
        // api/speak

        public HttpResponseMessage Get(String textToConvert, String outputFile, string gender, string age = "Adult")
        {
            VoiceGender voiceGender = (VoiceGender)Enum.Parse(typeof(VoiceGender), gender);
            VoiceAge voiceAge = (VoiceAge)Enum.Parse(typeof(VoiceAge), age);

            using (SpeechSynthesizer synthesizer = new SpeechSynthesizer())
            {
                synthesizer.SelectVoiceByHints(voiceGender, voiceAge);
                synthesizer.SetOutputToWaveFile(outputFile, new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
                synthesizer.Speak(textToConvert);
            }

            return Request.CreateResponse(HttpStatusCode.OK, new Response { HttpStatusCode = (int)HttpStatusCode.OK, Message = "Payload Accepted." });
        }
    }

The code is fairly straight forward and it is by no means production ready. But in my tests I have noticed the following occurs for any request to the controller:

the WAV file gets generated successfully
during debug, I can see the control hit return and exit the method
however, my browser just keeps spinning and I never get a response back from the server

I tried the same with Postman (a REST client for Chrome) and got the same result. Though I do want this to be a blocking call, in the interest of trying other things I modified synthesizer.Speak to synthesizer.SpeakAsync and encountered the same issue.

However when I test the snippets separately as shown below, the code works as expected.

Testing WebAPI call with speech section commented out:

public class SpeakController : ApiController
{
    //
    // api/speak

    public HttpResponseMessage Get(String textToConvert, String outputFile, string gender, string age = "Adult")
    {
        VoiceGender voiceGender = (VoiceGender)Enum.Parse(typeof(VoiceGender), gender);
        VoiceAge voiceAge = (VoiceAge)Enum.Parse(typeof(VoiceAge), age);

        //using (SpeechSynthesizer synthesizer = new SpeechSynthesizer())
        //{
        //  synthesizer.SelectVoiceByHints(voiceGender, voiceAge);
        //  synthesizer.SetOutputToWaveFile(outputFile, new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
        //  synthesizer.Speak(textToConvert);
        //}

        return Request.CreateResponse(HttpStatusCode.OK, new Response { HttpStatusCode = (int)HttpStatusCode.OK, Message = "Payload Accepted." });
    }
}

Testing speech separately in a console application:

static string usageInfo = "Invalid or no input arguments!"
    + "\n\nUsage: initiatives \"text to speak\" c:\\path\\to\\generate.wav gender"
    + "\nGender:\n\tMale or \n\tFemale"
    + "\n";

static void Main(string[] args)
{
    if (args.Length != 3)
    {
        Console.WriteLine(usageInfo);
    }
    else
    {
        ConvertStringToSpeechWav(args[0], args[1], (VoiceGender)Enum.Parse(typeof(VoiceGender), args[2]));
    }

    Console.WriteLine("Press any key to continue...");
    Console.ReadLine();
}

static void ConvertStringToSpeechWav(String textToConvert, String pathToCreateWavFile, VoiceGender gender, VoiceAge age = VoiceAge.Adult)
{
    using (SpeechSynthesizer synthesizer = new SpeechSynthesizer())
    {
        synthesizer.SelectVoiceByHints(gender, age);
        synthesizer.SetOutputToWaveFile(pathToCreateWavFile, new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
        synthesizer.Speak(textToConvert);
    }
}

WebAPI and SpeechSynthesis do not seem to play well together. Any help in figuring this out would be greatly appreciated.

Thanks!

Solution

I have no idea why this happens but running your SpeechSynthesizer in a separate thread seems to do the trick (incompatible threading model?). Here's how I've done it in the past.

Based on: Ultra Fast Text to Speech (WAV -> MP3) in ASP.NET MVC

public dynamic Post(dynamic req)
{
    try 
    {
        string phrase = req["phrase"].Value;

        var stream = new MemoryStream();
        var t = new System.Threading.Thread(() =>
            {
                using (var synth = new SpeechSynthesizer())
                {
                    synth.SetOutputToWaveStream(stream);
                    synth.Speak(phrase);
                    synth.SetOutputToNull();
                }
            });

        t.Start();
        t.Join();

        stream.Position = 0;

        var resp = new HttpResponseMessage(HttpStatusCode.OK);
        resp.Content = new StreamContent(stream);

        resp.Content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
        resp.Content.Headers.ContentDisposition = new System.Net.Http.Headers.ContentDispositionHeaderValue("attachment");
        resp.Content.Headers.ContentDisposition.FileName = "phrase.wav";

        return resp;
    }
    catch
    {
        return new HttpResponseMessage(HttpStatusCode.InternalServerError);
    }
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow