The classes FileReader and FileWriter are old utility classes, that unfortunately use the current platform encoding. On Windows almost certainly not UTF-8. And XML in general is in UTF-8 (which indeed can represent all characters.
fileReader = new BufferedReader(new FileReader(filePath));
fileWriter = new BufferedWriter(new FileWriter("./out/UnescapedHtml.html"));
should be
fileReader = new BufferedReader(new InputStreamReader(
new FileInputStream(filePath), StandardCharsets.UTF_8));
fileWriter = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("./out/UnescapedHtml.html"),
StandardCharsets.UTF_8));
To be entirely honest, one should read <?xml ...?>
and look whether it has an encoding
attribute for the charset, default is UTF-8. That could be done with StandardCharsets.ISO_8859_1
, as UTF-8 stumbles over wrong multi-byte sequences.
Using StandardCharsets instead of Strings "UTF-8" does away with
- an UnsupportedEncodingException to handle,
- a magic constant.
The StandardCharsets are guaranteed to be supported.