Getting wrong encoding when trying to replace cyrillic symbols

https://stackoverflow.com/questions/20294845

06-08-2022
|

Question

I have a problem with my string. After the for loop all I get some other symbols instead of my exact cyrillic letters. The goal is to change cyrillic letters: ąčęėįšųūž into this: a1, c2, e1, e2, i1, s2, u1, u2, z2. I have came up with tihs:

$ltSymbolsArray = array(
      'a1' => 'ą',
      'c2' => 'č',
      'e1' => 'ę',
      'e2' => 'ė',
      'i1' => 'į',
      's2' => 'š',
      'u1' => 'ų',
      'u2' => 'ū',
      'z2' => 'ž'
  );
  $string = 'ąsąžadcę';

  for ($i = 0; $i < strlen($string); $i++) {
    foreach ($ltSymbolsArray as $key => $value) {
      if ($string[$i] == $value) {
        $string[$i] = $key;
      }
    }
  }

It looks like a simple solution, but I can't handle the encoding. Encoding is a mystery for me so I would really appreciate any help on this problem.

Solution

You can't simply iterate over a unicode string and expect, that each iteration will receive a full character, if a single character really goes over more than one byte.

Use preg_split in combination with the unicode modifier to split your string into valid unicode characters. Then use the result of this to replace the characters in the original string.

You could also use one of the multibyte regex functions, such as mb_ereg_replace

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow