Question

When I try to cleanup HTML-code with Tidy.NET it broke HTML-entities like e @ or s and make HTML unreadable. I trying different settings, but it all variants was unsuccessful.

Anybody know how to solve this problem? May be hotfix exists?

Edit 1: I use this configuration of Tidy

Tidy doc = new Tidy();
doc.Options.DocType = DocType.User;
doc.Options.Xhtml = true;
doc.Options.WrapScriptlets = true;
doc.Options.LogicalEmphasis = true;
doc.Options.DropFontTags = true;
doc.Options.DropEmptyParas = true;
doc.Options.QuoteAmpersand = true;
doc.Options.TidyMark = false;
doc.Options.MakeClean = true;
doc.Options.IndentContent = true;
doc.Options.SmartIndent = true;
doc.Options.Spaces = 0;
doc.Options.WrapLen = 0;
doc.Options.CharEncoding = CharEncoding.UTF8;
doc.Options.RawOut = true;
doc.Options.EncloseText = false;

and then change doc.Options.CharEncoding = CharEncoding.UTF8; to doc.Options.CharEncoding = CharEncoding.Raw; but nothing happens.

Was it helpful?

Solution 2

I found a solution!

In file Lexer.cs on line 371 number should be HEX. I change it to

if (numeric && ((c == 'x') || (c == 'a') || (c == 'b') || (c == 'c') || (c == 'd') || (c == 'e') || (c == 'f') || (c == 'A') || (c == 'B') || (c == 'C') || (c == 'D') || (c == 'E') || (c == 'F')|| ((map & DIGIT) != 0)))

and parser become work properly.

OTHER TIPS

You could use the Replace method of the System.String class to fix the broken HTML

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top