Question

I'm trying to sort alphabetically an UTF-8 string. The result contains unknown characters, and I don't know why. The same thing happens with usort and sort.

setlocale(LC_COLLATE, 'ro_RO.UTF-8');

$word = 'ÎABAȚÂIEȘĂ';
$chars = str_split($word);

echo 'Word: ' . $word . "\n";

//sort($chars, SORT_LOCALE_STRING);

usort($chars, function($a, $b){
    echo 'Comparing: ' . $a . ' and ' . $b . "\n";
    return strcoll($a, $b);
});

echo 'Result: ' . implode($chars) . "\n";

Command line example: http://s18.postimg.org/avqfhetsp/test.gif

Was it helpful?

Solution

The problem is not caused by comparing and/or sorting, but by the str_split() function. Since multibyte version of this function does not exist, you should use mb_split() or preg_split() for this purpose instead.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top