Pregunta

I've tried a lot of different things and can't get the euro symbol to show. I'm using cURL to parse a page. The page is encoded in ISO-8859-1

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

There's a euro symbol on the page and when I use

$curl_scraped_page = curl_exec($ch);

I just get a black diamond with a question mark inside.

I've seen a lot of questions and answers on this site that were related but they didn't really work.

EDIT : I've tried to use the following:

$curl_scraped_page = preg_replace('/charset=(.*)"/', 'charset="UTF-8"', $curl_scraped_page);

and

$curl_scraped_page = iconv('iso-8859-1', 'UTF-8', $curl_scraped_page);

and

$curl_scraped_page = utf8_encode(curl_exec($ch));

I guess another question is, to display the euro sign, do I need to use UTF-8 or ISO-8859-1?

EDIT2 : I've tried this:

echo "Encoding is $enc";
echo iconv($enc, "ISO-8859-1", $curl_scraped_page);

The result was:

Encoding is ISO-8859-1

but there were still no euro symbols. When I view the source of the page, it still shows the diamond question marks but when I click 'View' on the browser and change it to ISO-8859-1, the euro symbols appear. So is it a browser issue?

¿Fue útil?

Solución 2

I set cURL to parse in ISO-8859-1 encoding, before I do the cURL parse

header('Content-Type: text/html; charset=iso-8859-1');
$curl_scraped_page = curl_exec($ch);

This means that it takes the Euro symbol in as it is on the page. Then when I echo the content with the Euro symbol, I don't have to worry about the encoding because I think it automatically formats to whichever encoding I'm using.

Otros consejos

Try to set the header for curl

$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; 
curl_setopt($ch, CURLOPT_HTTPHEADER, $header); 

its possible that curl make as default a "UTF-8" connection.

Edit:

What is when you convert it to iso with "utf8_decode" ?

PHP: curl_setopt

Just Apply htmlentities(curl_exec($ch)) This will not break at special characters

Just add the same meta Content-Type declaration to the webpage you're echo-ing the retrieved page on.

web browsers dont use the meta tag to determine the charset unless there is no http header present which declares the charset. It's the fallback, and most webservers specify the charset via http header, so meta tags are generally ignored in practice.

I'm saying: that page could be a different charset.

Check the http headers. Then declare you own page to match, again, via http headers, not the meta tag. Or convert the string to your preferred encoding.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top