Problem with diacritics and mb_substr
Question
I am slicing unicode string with diacritics using mb_substr
function but it works as I would use simple substr
function. It splits unicode characters in half displaying question marked diamond.
E.g.
echo mb_substr('ááááá', 0, 5); //Displays áá�
What might be wrong?
Solution
I have the same problem if I don't specify the encoding as the last parameter to mb_substr
: it defaults, at least on my server, to ISO-8859-1
.
But, if I set the encoding properly, to UTF-8
, it works OK :
echo mb_substr('ááááá', 0, 5, 'UTF-8');
Gets the right display in the browser :
ááááá
See mb_substr
(quoting, emphasis mine) :
string mb_substr ( string $str , int $start [,
int $length [, string $encoding ]] )
The
encoding
parameter is the character encoding. If it is omitted, the internal character encoding value will be used.
OTHER TIPS
I had the same problem and the above answers helped me too. Beside setting php.ini
or using ini_set()
, it may also help to use mb_internal_encoding('utf-8');
(utf-8
may be replaced at your choice) for setting permanent encoding for multibyte functions.