Question

I am slicing unicode string with diacritics using mb_substr function but it works as I would use simple substr function. It splits unicode characters in half displaying question marked diamond.

E.g.

echo mb_substr('ááááá', 0, 5); //Displays áá�

What might be wrong?

Was it helpful?

Solution

I have the same problem if I don't specify the encoding as the last parameter to mb_substr : it defaults, at least on my server, to ISO-8859-1.


But, if I set the encoding properly, to UTF-8, it works OK :

echo mb_substr('ááááá', 0, 5, 'UTF-8');

Gets the right display in the browser :

ááááá


See mb_substr (quoting, emphasis mine) :

string mb_substr  ( string $str  , int $start  [, 
    int $length  [, string $encoding  ]] )

The encoding parameter is the character encoding. If it is omitted, the internal character encoding value will be used.

OTHER TIPS

I had the same problem and the above answers helped me too. Beside setting php.ini or using ini_set(), it may also help to use mb_internal_encoding('utf-8'); (utf-8 may be replaced at your choice) for setting permanent encoding for multibyte functions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top