Algorithm to find pronounciation rules

https://cs.stackexchange.com/questions/54741

03-11-2019
|

Question

Suppose that you have a large dictionary with spellings and pronounciations of foreign words, and you want to find a set of pronunciation rules. They should have the simplest form: a sequence of letters to a sequence of sounds.

For example, in French, c [k], i [i] and ci [si].

The rules may be long: tion [sjɔ̃] (instead of [tjɔ̃]).
They may be rare: à [a] is used only in a dozen of words.
They may be both: aill [aj] (as a-ill, instead of ai-ll [ɛl]).

Fortunately, such rules are rare enough: for the most part, the letters are pronounced by itself, even most of them in complicated rules. For examples, cercle [sɛʀkl] instead of [kɛʀklɛ]: the algorithm should notice the corresponding sounds in the middle of the word.

Maybe, there is a standard algorithm to extract candidates for such rules? I would like to have an explicit list of rules, rather than a method predicting pronunciation.

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange