Pergunta

Please can anyone help me? I search some example how can i get information about speeching text in TTS through SAPI (I am programming my aplication in C# but it is not needed, SAPI is the same in C++, etc.) Information what I need is for example: User will write in textbox:

"This is a Text"..

tts.Speak("This is a text"); // this will "read" it..

ok, nice... but I need too get informations about "timing"..

for example:

"Th" (first sound (phoneme) of "This") was "read" in 0.01ms..

"i" (first sound of "is") was "read" in 0.5ms..

"e" (second sound of "Text") was "read" in 1.02ms..

when I save the .wav file generated by SAPI, I need to get information about the timing in the .wav for subsequent "processing" of the wav file.

Sorry for my english and sorry for my bad description of my problem but the problem is i think very simple and all will understand it. If not I will try to describe the problem again :) ^^..

Foi útil?

Solução

I have used C++ and SAPI 5.1 to synthesize speech and have a virtual character move its lips accordingly. Here is some code that works with visemes. According to the documentation at http://msdn.microsoft.com/en-us/library/ms720164(v=vs.85).aspx, phonemes work the same, except replace SPEI_VISEME with SPEI_PHONEME.

DWORD WINAPI Character::sayMessage(LPVOID lpParam){
    HRESULT hres;
    try{
        ::CoInitialize(NULL);
        ThreadParam * param = (ThreadParam *)lpParam;
        wstring s = param->message;

        //first check the string for null
        if (s == L"") return false;

        //http://msdn.microsoft.com/en-us/library/ms720163(VS.85,classic).asp is my source for this
        //set up text to speech

        //get the voice associated with the character
        ISpVoice * pVoice;
        pVoice = param->sceneObject->characterVoice;

        if (pVoice != NULL){
            pVoice->Speak( NULL, SPF_PURGEBEFORESPEAK, 0 );

            SPEVENT event;
            ULONG ul;

            pVoice->SetInterest(SPFEI(SPEI_VISEME)|SPFEI(SPEI_END_INPUT_STREAM),SPFEI(SPEI_VISEME)|SPFEI(SPEI_END_INPUT_STREAM));
            pVoice->SetNotifyCallbackFunction(&eventFunction,0,0);
            pVoice->WaitForNotifyEvent(INFINITE);

            if (param->sceneObject->age == CHILD){
                s = L"<pitch middle=\"+10\">" + s + L"</pitch>";
            }

            hres = pVoice->Speak(s.c_str(),SPF_ASYNC,NULL);

            bool isDone = false;
            while(!isDone && pVoice != NULL && !FAILED(hres)){                  
                if(pVoice->GetEvents(1,&event, &ul) == S_OK){
                    if(event.eEventId==SPEI_VISEME){
                        //get the viseme
                        int vis = LOWORD(event.lParam);  //handle it however you'd like after this


                    }
                    else if(event.eEventId== SPEI_END_INPUT_STREAM){
                        isDone = true;
                        s = L"";
                        return true;
                    }
                }                   
            }
        }
    }
    catch(...){
        return false;
    }       
    return !FAILED(hres);
}
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top