Question

Suppose that you have a large dictionary with spellings and pronounciations of foreign words, and you want to find a set of pronunciation rules. They should have the simplest form: a sequence of letters to a sequence of sounds.

For example, in French, c [k], i [i] and ci [si].

  1. The rules may be long: tion [sjɔ̃] (instead of [tjɔ̃]).

  2. They may be rare: à [a] is used only in a dozen of words.

  3. They may be both: aill [aj] (as a-ill, instead of ai-ll [ɛl]).

Fortunately, such rules are rare enough: for the most part, the letters are pronounced by itself, even most of them in complicated rules. For examples, cercle [sɛʀkl] instead of [kɛʀklɛ]: the algorithm should notice the corresponding sounds in the middle of the word.

Maybe, there is a standard algorithm to extract candidates for such rules? I would like to have an explicit list of rules, rather than a method predicting pronunciation.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top