A-Z
still matches only ASCII letters.
Use the Unicode property \p{L}
(any Unicode letter character) instead. This way, you don't even need the i
or u
modifier. Like:
\p{L}[\p{L}0-9 \-.',]*
There could be one more problem, though. In Unicode, characters with diacritics can also be represented by multiple characters. For instance á
could be a single Unicode character (U+00E1), or it could be a a
(U+0061) followed by a combining mark ´
(U+0301). \p{L}
matches only stand-alone characters but not those combining marks. Hence, to catch these cases as well, you might want to insert the Unicode property of combining marks into the repetition as well:
\p{L}[\p{L}\p{M}0-9 \-.',]*