سؤال

I apologize for such a topic title. But it is because the problem is so.

Now I'm writing parser for Twitter and when in the text of tweet script stumbles upon these symbols 💗⚫️, Yii generate errors as:

SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9F\x98\x8D\xF0\x9F...' for column 'code' at row 1.

I wrote the following code:

if (preg_match('/😍/si', $texts[$i])) {
 $texts[$i] = str_replace('😍', '', $texts[$i]); 
}

But it did not help me, because all these characters have different Unicode (they are only in the form of squares)...

I wrote the following code too:

        if (preg_match('/xF0/si', $texts[$i])) {
            unset($texts[$i]);
        }

But it did not help me too...

These symbols is: ✂ ✃ ✄ ✆ ✇ ✈ ✉ ✌ ✍ ✎ ✏ ✐ ✑ ✒ ✓ ✔ ✕ ✖ ✗ ✘ ✙ ✚ ✛ ✜ ✝ ✞ ✟ ✠ ✡ ✢ ✣ ✤ ✥ ✦ ✧ ✩ ✪ ✫ ✬ ✭ ✮ ✯ ✰ ✱ ✲ ✳ ✴ ✵ ✶ ✷ ✸ ✹ ✺ ✻ ✼ ✽ ✾ ✿ ❀ ❁ ❂ ❃ ❄ ❅ ❆ ❇ ❈ ❉ ❊ ❋ ❍ ❏ ❐ ❑ ❒ ❖ ❘ ❙ ❚ ❛ ❜ ❝ ❞ ❡ ❢ ❣ ❤ ❥ ❦ ❧ ❶ ❷ ❸ ❹ ❺ ❻ ❼ ❽ ❾ ❿ ➀ ➁ ➂ ➃ ➄ ➅ ➆ 7 ➇ ➈ ➉ ➊ ➋ ➌ ➍ ➎ ➏ ➐ ➑ ➒ ➓ ➔ ➘ ➙ ➚ ➛ ➜ ➝ ➞ ➟ ➠ ➡ ➢ ➣ ➤ ➥ ➦ ➧ ➨ ➩ ➪ ➫ ➬ ➭ ➮ ➯ ➱ ➲ ➳ ➴ ➵ ➶ ➷ ➸ ➹ ➺ ➻ ➼ ➽ and many many others...

enter image description here

How I can remove all these symbols from parsed text (without using utf8mb4)?

لا يوجد حل صحيح

نصائح أخرى

You're oh so close. Combining your code with Marc B's comments, we have this:

if (preg_match('/\xF0/si', $texts[$i])) {
  $texts[$i] = preg_replace('/\xF0/si', '', $texts[$i]); 
}
function replace4byte($string) {
    return preg_replace('%(?:
          \xF0[\x90-\xBF][\x80-\xBF]{2}      # planes 1-3
        | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
        | \xF4[\x80-\x8F][\x80-\xBF]{2}      # plane 16
    )%xs', '', $string);    
} 
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top