Question

I'm using cyberneko to clean and process html documents.

I need to be able to process all the comments that occur in the original html documents.

I've configured the cyberneko sax parser to process comments like so:

parser.setProperty("http://xml.org/sax/properties/lexical-handler", consumer);

...using the same consumer as I am for DOM events.

I get a callback for each of the comments:

 @Override
 public void comment(char[] arg0, int arg1, int arg2) throws SAXException {
  System.out.println("COMMENT::: "+new String(arg0, arg1, arg2));
 }

The problem I have is that all the comments are processed first, out of context of the DOM. i.e. I get a callback for all the comments before the document head, body etc....

What I'd like is for the comment callbacks to occur in the order they occur in the DOM.

Edit: what I'm actually trying to do is pass through instructions for IE in the original html, such as:

 <!--[if lte IE 6]><body class="news ie"><![endif]-->

At the moment they are all dropped, I need to include them in the cleaned HTML document.

Was it helpful?

Solution

There's probably a simple explanation that would be clear if you showed us more of your code.

But if it's a problem with cybernecko, you could try a different parser such as TagSoup.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top