Domanda

my script work great, but today after checkin logs i found some matrix words, after analysing i understood that there is something with utf8, files are parsed, title is extracted, but result instead of russian words is (Сериалы ТУТ! СериÐ) unknown symbols

i use

$cont = "dasdas<title>Сериалы ТУТ! Сериалы онлайн sda</title>";
preg_match("'<title[^>]*?>(.*)</title>'siU", $cont, $match);

//$match[1] = Сериалы ТУТ! СериРsda

when i try to add pattern modifier /u there is no changes, the same unknown matrix words. Please.

Maybe there is something with PHP?

È stato utile?

Soluzione

It is not a php or a regex problem, but an html problem. To obtain a correct display, you must add <meta charset="UTF-8"/> in the header of your html code.

As an aside comment: using the U modifier is useless:

preg_match('~<title[^>]*>(.*?)</title>~si', $cont, $match);
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top