Accented characters : difference before and after compilation

https://stackoverflow.com/questions/17951371

04-06-2022
|

Question

I made a light apps which read the HTML code of a page and display it to the user.

During the developement whith NetBeans, no problems at all, but when I use the .jar given by this IDE after a "Clean Build", I have some troubles with the accents.

For exemple, the french word "renégocier", was displayed as such under NetBeans. But with the clean build of NetBeans, the word is displayed "renÃ©gocier" ...

Any idea?

EDIT : this is how I read the HTML code :

URL urlObject=null;
URLConnection con=null;
String inputLine;
String codeHTML

urlObject = new URL(UrlToVerification);
con = urlObject.openConnection();
BufferedReader webData = new BufferedReader(new     InputStreamReader(con.getInputStream()));

while ((inputLine = webData.readLine()) != null)
{
    codeHTML += inputLine; // Lecture du code HTML
 }

SOLUTION :

Replace:

BufferedReader webData = new BufferedReader(new     InputStreamReader(con.getInputStream()));

with :

BufferedReader webData = new BufferedReader(new InputStreamReader(urlObject.openStream(), "UTF-8"));

Solution

Your code is using the platform default character encoding when reading the url content. Instead, you need to pass an explicit character encoding to the InputStreamReader. This should be the encoding specified by the url itself (this should be included in the "Content-Type" header). if the character encoding is not included in the relevant header, then you need to pick an appropriate default.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow