Autocorrect algorithm - How to implement a decent “apostrophe & elision” feature?
-
05-11-2019 - |
Question
shorter version of the original question (as requests in comments)
What is the algorithm to handle missing apostrophe as it is done in auto-correct feature of Android virtual keyboard?
(If it helps / makes sense, feel free to give info about how such feature handles the detection and 2 words being written without any space between them?)
I need this in French where this happens much more than in English.
Example
Jai -> J'ai ('I have' in French)
Regex could be part of the solution?
Here are some regex to recognize the beginning of some (french) words, but these do generate way too many false positive candidates
\b(c)(h?[aeiou])
\b(j)(h?[aeiou])
\b(n)(h?[aeiou])
\b(m)(h?[aeiou])
\b(t)(h?[aeiou])
\b(s)(h?[aeiou])
\b(l)(h?[aeiou])
\b(d)(h?[aeiou])
\b(qu)(h?[aeiou])
Note:
For sake of completeness, I alreay have the 2 remaining simple cases already covered in the dictionary xml file
<word src="sil">s'il</word>
<word src="sils">s'ils</word>
Keeping original question for reference purposes:
Context
Using the opensource AnyosftKeyboard keayboard provider app, I would like to add an "elision/apostrophe" feature to make it aware of missing apostrophe in typed word and add it in the frame of its autocorrect feature.
A word about apostrophe & elision
"The apostrophe in French is used to replace a final vowel which is not pronounced because the next word also start with a vowel or silent “h”. The removing of a final silent vowel is called the ELISION.
In written French, elision takes place only with the following words : ce, je, ne, me, te, se, le, la, de, que, and si (only with “il” and “ils”)."
Source, where you can read more for some examples:
https://frenchforenglishhindispeakers.wordpress.com/2012/09/25/lapostrophe/
Example
He is my friend -> Ce est mon ami (incorrect)
'Ce' is followed by 'est' -> elision (deletion) of the 'e' of 'Ce')
-> C'est mon ami (correct)
Goal
Given the former example, I would like to implement a feature that would detect
Cest
and replace it by
C'est
That would allow the user to type faster by just ignoring apostrophe.
This behavior is already implemented in default Android keyboard
Questions
1 - What would be a proper algorithm to implement this "apostrophe & elison" feature?
2 - How is this feature implemented in Android keyboard(s)?
PS
I am not sure if this is the right place to ask such question. If not, please let me know where I should ask it. Thanks.
No correct solution