Вопрос

Despite http://it2.php.net/manual/en/reference.pcre.pattern.modifiers.php not mentioning it at all, PCRE doesn't seem to work correctly with utf8 strings prior to PHP 5.3.4 even with the 'u' modifier (which is supposed to enable support for utf8 and which according to the abovementioned documentation is available even since PHP 4.something)

preg_split("/\W+/u", $someUtf8String)

will work as expected on PHP 5.3.4 and above, but will break the string on characters such as ó ò ú í ì and the like, as if they were non-word, on older versions

See: http://3v4l.org/ERDp5 and if you have doubts (as I do have) about whether or not the string is actually utf8-encoded you can try: http://3v4l.org/6XnOj http://3v4l.org/mak33

Either there was a bug which was fixed only in 5.3.4, or utf8 was not supported (in which case I wonder why the 'u' modifier is available at all)

The question is: is there a workaround for older PHP versions? I need to have \W work correctly on a utf8 string on PHP 5.1.6

Это было полезно?

Решение

How about mb_split?

mb_split("\W+", "histórica");

Notice: Without delimiters

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top