سؤال

Right now I'm using SAXParser with my own handler, it can parse all node values except for the one that has type="html"

My characters function is like this:

public void characters(char ch[], int start, int length) throws SAXException {
        if(content){
        String tmp = new String(ch, start, length);
        System.out.println("Content : " + tmp);
        content = false;
        }

And that particular node has the following format, which my output always just give me a bunch of \n and nothing else.

   <content type="html">

    &lt;img alt="" src="http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png" /&gt;


     &lt;p&gt;Bad Robot, the production company founded by geek culture hitmaker J.J. Abrams (&lt;i&gt;Lost&lt;/i&gt;, &lt;i&gt;Fringe&lt;/i&gt;, &lt;i&gt;Star Trek: Into Darkness&lt;/i&gt;, &lt;i&gt;Alias&lt;/i&gt;,&amp;nbsp;etc.), has released a&amp;nbsp;&lt;a href="http://youtu.be/FWaAZCaQXdo" target="_blank"&gt;mysterious new trailer&lt;/a&gt; titled "Stranger." The creepy and inscrutable video spot, posted by the official Bad Robot Twitter account this afternoon, features a starry sky; a long-haired, rope-bound man wandering along a desolate monochromatic shore line; and your garden variety, horrifying stitched-mouth person coming into focus. "Men are erased and reborn," intones a narrator that sounds a little like Leonard Nimoy.&lt;/p&gt;
     &lt;p&gt;&lt;/p&gt;



    </content>
هل كانت مفيدة؟

المحلول

You might be wrongfully assuming that the characters callback occurs only once in between startElement and endElement callbacks. It is actually called multiple times.

Since you use the content boolean member to determine whether to print stuff or not and also set this same member to false inside characters callback, your condition is bound to be fulfilled only once, until you reset content (it is not clear where you do that).

Here's an example that works with your XML just fine (assumes non-mixed content and Java programming language):

import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class TestSaxParser {

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        String xml = 
            "<content type=\"html\">\n" +
            "\n" +
            "    &lt;img alt=\"\" src=\"http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png\" /&gt;\n" +
            "\n" +
            "\n" +
            "     &lt;p&gt;Bad Robot, the production company founded by geek culture hitmaker J.J. Abrams (&lt;i&gt;Lost&lt;/i&gt;, &lt;i&gt;Fringe&lt;/i&gt;, &lt;i&gt;Star Trek: Into Darkness&lt;/i&gt;, &lt;i&gt;Alias&lt;/i&gt;,&amp;nbsp;etc.), has released a&amp;nbsp;&lt;a href=\"http://youtu.be/FWaAZCaQXdo\" target=\"_blank\"&gt;mysterious new trailer&lt;/a&gt; titled \"Stranger.\" The creepy and inscrutable video spot, posted by the official Bad Robot Twitter account this afternoon, features a starry sky; a long-haired, rope-bound man wandering along a desolate monochromatic shore line; and your garden variety, horrifying stitched-mouth person coming into focus. \"Men are erased and reborn,\" intones a narrator that sounds a little like Leonard Nimoy.&lt;/p&gt;\n" +
            "     &lt;p&gt;&lt;/p&gt;\n" +
            "\n" +
            "\n" +
            "\n" +
            "    </content>";

        MySaxHandler handler = new MySaxHandler();
        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser parser = factory.newSAXParser();        
        InputSource source = new InputSource(new StringReader(xml));
        parser.parse(source, handler);
    }

    private static class MySaxHandler extends DefaultHandler {
        private StringBuilder content = new StringBuilder();

        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
            content.setLength(0);
        }

        @Override
        public void characters(char[] ch, int start, int length) throws SAXException {
            content.append(ch, start, length);
        }

        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {
            System.out.println(content.toString());
        }

    }    
}

Output:

    <img alt="" src="http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png" />


     <p>Bad Robot, the production company founded by geek culture hitmaker J.J. Abrams (<i>Lost</i>, <i>Fringe</i>, <i>Star Trek: Into Darkness</i>, <i>Alias</i>,&nbsp;etc.), has released a&nbsp;<a href="http://youtu.be/FWaAZCaQXdo" target="_blank">mysterious new trailer</a> titled "Stranger." The creepy and inscrutable video spot, posted by the official Bad Robot Twitter account this afternoon, features a starry sky; a long-haired, rope-bound man wandering along a desolate monochromatic shore line; and your garden variety, horrifying stitched-mouth person coming into focus. "Men are erased and reborn," intones a narrator that sounds a little like Leonard Nimoy.</p>
     <p></p>

نصائح أخرى

You should use StringBuffer to store content as it's described in these topics:

SAX parsing and special characters

Unable to read special characters from xml using java

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top