The solution is to use JclPCRE
and statically link PCRE
with it.
How to use TPerlRegex with Unicode
-
15-06-2023 - |
Вопрос
How can I use TPerRegex with regular Delphi String
type avoiding any UTF-8 <> UTF-16 string conversions. It seems Delphi XE5 doesn't come with UTF-16 capable PCRE library?
http://qc.embarcadero.com/wc/qcmain.aspx?d=108941
As of version 8.30 PCRE supports Unicode.
Решение 2
Другие советы
AFAIK PCRE library embedded with Delphi is not compiled with UTF-16 APIs, but with UTF-8 APIs.
But once again, UTF-8 is as Unicode-ready as UTF-16! So the PCRE version embedded within Delphi XE5 is 100% Unicode ready... :)
Your link states in addition that current implementation is dead slow due to wrong flags: PCRE_NO_UTF8_CHECK
is missing in EMB's code.
You can try to use directly the lib as we did here, and by-pass the slow TPerlRegEx
class.
The slowdown does not comes from the fact that the UTF-8 version of the library is used. UTF-8 version is as fast as the UTF-16 version. Nor is the UTF-16 into UTF-8 conversion slow by itself: it will just be a slightly slowdown. But the issue is this missing PCRE_NO_UTF8_CHECK
flag...