How to stop HtmlTidy from converting umlauts (e.g. ü to ü)
-
20-09-2019 - |
Question
Our website runs the user's input via HtmlTidy to clean it. Apparently while doing this it also causes pain for our international subscribers by converting umlauts. Is there any option to specify for HtmlTidy to not do this?
I tried CharacterEncoding with all possible options but nothing seems to be working.
Solution
Simply provide an output encoding (input encoding is optional) in the configuration file:
input-encoding: win1252
output-encoding: latin1
For an overview of available encodings, look at the output-encoding documentation.
EDIT: So you're using the .NET bindings. It's the very same settings:
Document d = new Document(new FileStream("in.html", FileMode.Open));
d.InputCharacterEncoding = EncodingType.Utf8;
d.OutputCharacterEncoding = EncodingType.Win1252;
d.CleanAndRepair();
d.Save("out.html");
With the correct encodings set, you will get the correct result, without ü
and the like.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow