Question

How can I use TPerRegex with regular Delphi String type avoiding any UTF-8 <> UTF-16 string conversions. It seems Delphi XE5 doesn't come with UTF-16 capable PCRE library?

http://qc.embarcadero.com/wc/qcmain.aspx?d=108941

As of version 8.30 PCRE supports Unicode.

Was it helpful?

Solution 2

The solution is to use JclPCRE and statically link PCRE with it.

OTHER TIPS

AFAIK PCRE library embedded with Delphi is not compiled with UTF-16 APIs, but with UTF-8 APIs.

But once again, UTF-8 is as Unicode-ready as UTF-16! So the PCRE version embedded within Delphi XE5 is 100% Unicode ready... :)

Your link states in addition that current implementation is dead slow due to wrong flags: PCRE_NO_UTF8_CHECK is missing in EMB's code.

You can try to use directly the lib as we did here, and by-pass the slow TPerlRegEx class.

The slowdown does not comes from the fact that the UTF-8 version of the library is used. UTF-8 version is as fast as the UTF-16 version. Nor is the UTF-16 into UTF-8 conversion slow by itself: it will just be a slightly slowdown. But the issue is this missing PCRE_NO_UTF8_CHECK flag...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top