I'm currently facing a very strange encoding issue when dealing with an html source code.
I got the following line:
"requête présentée par..."
When an extern library does an utf8_decode I got:
"reque^te présente´e par..."
So accents are placed right to the accented characters. If I do an utf8_encode from that result, I don't get the original "requête présentée par..." but I keep having "reque^te présente´e par..."
Even stranger: If I open the original html in Notepad++, encoding is utf8 without BOM (so far, so good) but I can actually select half of the character with the text selection (keyboard or mouse). Yes, half of it. As if the real code was "e^" but it was displayed as "ê". When I try to copy it to my IDE it copies "ê" but pastes "e^".
I have come up with a basic replacement function:
"e^" => "ê",
"e´" => "é",
...
and some other french cases, and it's working properly for now.
But as the HTML comes in differents languages, I'm pretty sure I won't be able to successfully replace every character under this encoding issue.
Has anybody face this issue before and (hopefully) has a more general solution?
Thanks in advance.