Question

I apologize for such a topic title. But it is because the problem is so.

Now I'm writing parser for Twitter and when in the text of tweet script stumbles upon these symbols 💗⚫️, Yii generate errors as:

SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9F\x98\x8D\xF0\x9F...' for column 'code' at row 1.

I wrote the following code:

if (preg_match('/😍/si', $texts[$i])) {
 $texts[$i] = str_replace('😍', '', $texts[$i]); 
}

But it did not help me, because all these characters have different Unicode (they are only in the form of squares)...

I wrote the following code too:

        if (preg_match('/xF0/si', $texts[$i])) {
            unset($texts[$i]);
        }

But it did not help me too...

These symbols is: ✂ ✃ ✄ ✆ ✇ ✈ ✉ ✌ ✍ ✎ ✏ ✐ ✑ ✒ ✓ ✔ ✕ ✖ ✗ ✘ ✙ ✚ ✛ ✜ ✝ ✞ ✟ ✠ ✡ ✢ ✣ ✤ ✥ ✦ ✧ ✩ ✪ ✫ ✬ ✭ ✮ ✯ ✰ ✱ ✲ ✳ ✴ ✵ ✶ ✷ ✸ ✹ ✺ ✻ ✼ ✽ ✾ ✿ ❀ ❁ ❂ ❃ ❄ ❅ ❆ ❇ ❈ ❉ ❊ ❋ ❍ ❏ ❐ ❑ ❒ ❖ ❘ ❙ ❚ ❛ ❜ ❝ ❞ ❡ ❢ ❣ ❤ ❥ ❦ ❧ ❶ ❷ ❸ ❹ ❺ ❻ ❼ ❽ ❾ ❿ ➀ ➁ ➂ ➃ ➄ ➅ ➆ 7 ➇ ➈ ➉ ➊ ➋ ➌ ➍ ➎ ➏ ➐ ➑ ➒ ➓ ➔ ➘ ➙ ➚ ➛ ➜ ➝ ➞ ➟ ➠ ➡ ➢ ➣ ➤ ➥ ➦ ➧ ➨ ➩ ➪ ➫ ➬ ➭ ➮ ➯ ➱ ➲ ➳ ➴ ➵ ➶ ➷ ➸ ➹ ➺ ➻ ➼ ➽ and many many others...

enter image description here

How I can remove all these symbols from parsed text (without using utf8mb4)?

Pas de solution correcte

Autres conseils

You're oh so close. Combining your code with Marc B's comments, we have this:

if (preg_match('/\xF0/si', $texts[$i])) {
  $texts[$i] = preg_replace('/\xF0/si', '', $texts[$i]); 
}
function replace4byte($string) {
    return preg_replace('%(?:
          \xF0[\x90-\xBF][\x80-\xBF]{2}      # planes 1-3
        | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
        | \xF4[\x80-\x8F][\x80-\xBF]{2}      # plane 16
    )%xs', '', $string);    
} 
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top