Domanda

i´m extracting some text from a weblink with file_get_contents, i have no influence on the text, the bits i talk about are already malformed in the sourcecode of the weblink i got the contents from , and look sth. like so :

 /$%§&fdsgfkgfd � fdsfdsfs � � -->
 <h1>m�lll</h1>
 <h1>m�lll</h1>
 <h1>m�lll</h1>
 <h1>m�lll</h1>
 <h1>m�lll</h1>
 <h1>m�lll</h1>

or

 <<<!-- � födns

my php file is not meant to "be" an html file so its just a string im dealing with,

I searched the internet but its difficult with that icon,

i want to remove them because they are not necessary, how can i remove them ?

ps: i´m not looking through a browser, i var_dump the text in a console

Solution:

i use tthis function for first cast the string as utf-8 string

function convToUtf8($str) 
{ 
if( mb_detect_encoding($str,"UTF-8, ISO-8859-1, GBK")!="UTF-8" ) 
{ 

return  iconv("gbk","utf-8",$str); 

} 
else 
{ 
return $str; 
} 

} 
È stato utile?

Soluzione

You can discard characters that are not supported by an encoding, with iconv():

$converted = iconv($input_encoding, $output_encoding . '//IGNORE', $original);

There are two drawbacks:

  1. You need to know the input encoding, and
  2. as you can read in a user comment in the manual, iconv() has a bug so that '//IGNORE' does not work with recent versions of the iconv library. The suggested workaround is (here for UTF-8):

    ini_set('mbstring.substitute_character', 'none'); 
    $text = mb_convert_encoding($text, 'UTF-8', 'UTF-8');
    

However, it is much better to attempt to detect the input encoding and convert the input to the output encoding. This leads to:

function recode ($input, $output_encoding)
{
  $input_encoding = mb_detect_encoding($input);

  if ($input_encoding === false)
  {
    $old_substitute = mb_substitute_character();
    mb_substitute_character('none'); 

    $converted = mb_convert_encoding($input, $output_encoding, $output_encoding);

    mb_substitute_character($old_substitute);
  }
  else
  {
    $converted = ($output_encoding !== $input_encoding)
      ? iconv($input_encoding, $output_encoding, $input)
      : $input;
  }

  return $converted;
}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top