Teaching Microsoft.Speech Engine Pronunciation of a Number of none English Words

https://stackoverflow.com/questions/18828317

28-06-2022
|

Pregunta

I am developing a c# application using Kinect that relies on voice input to do things. I have a list of Arabic words that the user can say to select different menu items.

I have been searching over the past few days with little success. Things I found:

CMU Sphinx: http://www.ccse.kfupm.edu.sa/~elshafei/AASR.htm The first problem with this is that it is java based. I have looked at KVM and the bridge one but I couldn't get too far with this thing. I couldn't set it up to work in Java. There are no steps on how to use the already prepared files.

I have also looked at using an SRGSdocument as suggested by this link Specifying a pronunciation of a word in Microsoft Speech API but this is too complicated for my purposes and I don't even know if it is what I need.

I have also looked at Microsoft Speech Recognition Custom Training The person's problem was similar but I cannot solve my problem the same way.

I cannot use a commercial application such as Sakhr because I do not have the budget for it. Simply adding words to a grammar will not work because these words don't obey normal pronunciation rules of the English Language.

Basically, what I'm looking for is some sort of tool that can connect a word written in English with a set of different pronunciations coming from a microphone (as in pretrained) and that then can be referenced by the Speech engine during run time. Is this possible?

I am open to any options.

Thanks.

Solución

I think what you want to do is to specify a custom lexicon for your recognizer. That will allow you to "connect a word written in English with a set of different pronunciations" as you said.

The lexicon maps written words to pronunciations written in a phonetic alphabet. You can override the default lexicon (which will have English pronunciations for each word, if you're using an English recognizer) with your own lexicon, either by writing a new lexicon as an XML document, or by specifying individual pronunciations inline.

So you can define the pronunciation of the Arabic word as a sequence of phones (I think you'd have to use only phones that occur in English, otherwise the recognition might not work properly), then link it to the English written word (grapheme) in a lexicon or inline.

This page explains everything: About Lexicons and Phonetic Alphabets (Microsoft.Speech)

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow