Javascript Regex + Unicode Diacritic Combining Characters`

Question 1

Usually this is made by combining an 'é' with a '\u0323' under dot diacritic

However, that isn't what you have here:

'ẹ́'

that's not U+0065,U+0323 but U+1EB9,U+0301 - combining an ẹ with an acute diacritic.

The usual solution would be to normalise each string (typically to Unicode Normal Form C) before doing the comparison.

I don't just want to match e. I want to match all combinations

Matching without diacriticals is typically done by normalising to Normal Form D and removing all the combining diacritical characters.

Unfortunately normalisation is not available in JS, so if you want it you would have to drag in code to do it, which would have to include a large Unicode data table. One such effort is unorm. For picking up characters based on Unicode preoperties like being a combining diacritical, you'd also need a regexp engine with support for the Unicode database, such as XRegExp Unicode Categories.

Server-side languages (eg Python, .NET) typically have native support for Unicode normalisation, so if you can do the processing on the server that would generally be easier.

Question 2

Normally the solution would be to use Unicode properties and/or scripts, but JavaScript does not support them natively.

But there exists the lib XRegExp that adds this support. With this lib you can use

\p{L}: to match any kind of letter from any language.

\p{M}: a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.).

So your character class would look like this:

[\p{L}\p{M}]+

that would match all possible letters that are in the Unicode table.

If you want to limit it, you can have a look at Unicode scripts and replace \p{L} by a script, they collect all letters from certain languages. e.g. \p{Latin} for all Latin letters or \p{Cyrillic} for all Cyrillic letters.