Why does str_replace not correctly replace these extended ascii characters?
-
21-07-2021 - |
Frage
This is a UTF-8 encoded source file. I must be missing something obvious, but I've tried all the permutations I can think of.
<?php
$bad = array( chr(130), chr(145), chr(146), chr(147), chr(148), chr(150), chr(151), chr(173), chr(160) );
$good = array( chr( 44), chr( 39), chr( 39), chr( 34), chr( 34), chr( 45), chr( 45), chr( 45), chr( 32) );
print_r($bad);
print_r($good);
$str = <<<EOF
bad comma ‚
bad quote ‘
bad quote ’
bad quote “
bad quote ”
bad dash –
bad dash —
bad dash
bad space
EOF;
echo $str;
$clean = str_replace($bad, $good, $str);
echo "\n";
echo( $clean);
And when I open it in a browser and view source...
Array
(
[0] => ‚
[1] => ‘
[2] => ’
[3] => “
[4] => ”
[5] => –
[6] => —
[7] =>
[8] =>
)
Array
(
[0] => ,
[1] => '
[2] => '
[3] => "
[4] => "
[5] => -
[6] => -
[7] => -
[8] =>
)
bad comma ‚
bad quote ‘
bad quote ’
bad quote “
bad quote â€
bad dash –
bad dash —
bad dash Â
bad space
bad comma ‚
bad quote ‘
bad quote ’
bad quote “
bad quote â€
bad dash â€"
bad dash â€"
bad dash Â-
bad space
Lösung
There is a difference between ASCII, Multibyte and UTF-8.
In your case those characters are Multibyte Characters. "Multibyte" is just an arbitrary non-UTF-8 encoding.
you could convert them to UTF-8 first or use mb_ereg_replace
Most PHP Devs don't know much about Character Encoding, but it's one of the most important things when working in C/C++.
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow