Question

I am getting data from various site through url. Url parameters are url-encoded with the php urlencode() function, but character encoding can be still be UTF-8 or Latin-1.

For example, the é character, when url-encoded from UTF-8 becomes %C3%A9 but when url-encoded from Latin-1, it becomes %E9.

When I get data through url, I use urldecode() and then I need to know what is the character encoding so I eventually use utf8_encode before I insert them in a MySQL database.

Strangely, the following code doesn't work :

$x1 = 'Cl%C3%A9ment';
$x2 = 'Cl%E9ment';

echo mb_detect_encoding(urldecode($x1)).' / '.mb_detect_encoding(urldecode($x2));

It returns UTF-8 / UTF-8

Why is that, what am I doing wrong and how can I know the character encoding of those string ?

Thanks

Was it helpful?

Solution

mb_detect_encoding() is normally useless with the default second parameter:

<?php

$x1 = 'Cl%C3%A9ment';
$x2 = 'Cl%E9ment';

$encoding_list = array('utf-8', 'iso-8859-1');

var_dump(
    mb_detect_encoding(urldecode($x1), $encoding_list),
    mb_detect_encoding(urldecode($x2), $encoding_list)
);

... prints:

string(5) "UTF-8"
string(10) "ISO-8859-1"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top