Question

We are using JTidy to clean up some html for sax processing. We've had a lot of trouble around spacing issues as shown in this example:

Html

<i>stack<span
class="bold">overflow</span></i>

which outputs "stackoverflow"

But...

Post JTidy

<i>stack
<span
class="bold">overflow</span></i>

which outputs "stack overflow" (note the new space)

Anyone have any advice to fix/handle this better. I've been through all the Tidy/JTidy settings and don't see anything to account for this issue.

Was it helpful?

Solution

Turns out this simple example doesn't really show the issue. The actual issue was that Tidy/JTidy was using a default wrapping setting which was causing the above issue (and other various spacing issues) when there were very long attribute values.

Everything was fixed with:

 jtidy.setWraplen(0);
 jtidy.setWrapAttVals(false);

OTHER TIPS

What settings are you using? Executing JTidy from the command line using its default settings on the snippet you posted prints this:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta name="generator" content="HTML Tidy, see www.w3.org">
<title></title>
</head>
<body>
<i>stack<span class="bold">overflow</span></i>
</body>
</html>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top