Algorithm to find pronounciation rules
-
03-11-2019 - |
Question
Suppose that you have a large dictionary with spellings and pronounciations of foreign words, and you want to find a set of pronunciation rules. They should have the simplest form: a sequence of letters to a sequence of sounds.
For example, in French, c [k], i [i] and ci [si].
The rules may be long: tion [sjɔ̃] (instead of [tjɔ̃]).
They may be rare: à [a] is used only in a dozen of words.
They may be both: aill [aj] (as a-ill, instead of ai-ll [ɛl]).
Fortunately, such rules are rare enough: for the most part, the letters are pronounced by itself, even most of them in complicated rules. For examples, cercle [sɛʀkl] instead of [kɛʀklɛ]: the algorithm should notice the corresponding sounds in the middle of the word.
Maybe, there is a standard algorithm to extract candidates for such rules? I would like to have an explicit list of rules, rather than a method predicting pronunciation.
No correct solution