SAX getting only the end of a content string

https://stackoverflow.com/questions/18833879

28-06-2022
|

Question

I need to catch data from < itunes:sumary > tag but my handler is getting only the end of tag's content (last three words for example). I don't know what to do because other tags are being handled as expected, getting all content.*

I've seen that some tags are ignored by parser, but I don't think it's happening with because as I said it gets the content but only the end of that.

The source XML is hosted in -> http://djpaulonla.podomatic.com/archive/rss2.xml

Please, could someone help me??? The code is the following:

public class PodOMaticCustomHandler extends CustomHandler {

public PodOMaticCustomHandler(int quantityToFetch, String startTagValue,
        String endTagValue) {
    super(quantityToFetch, startTagValue, endTagValue);
}

@Override
public void characters(char[] ch, int start, int length)
        throws SAXException {
    super.characters(ch, start, length);
    this.value = new String(ch, start, length);
}

@Override
public void endDocument() throws SAXException {
    super.endDocument();
    this.endDoc = true;
}

@Override
public void endElement(String uri, String localName, String qName)
        throws SAXException {
    super.endElement(uri, localName, qName);

    if (this.podcast != null) {
        if (qName.equalsIgnoreCase("title")) {
            podcast.setTitle(this.value);
        } else if (qName.equalsIgnoreCase("pubDate")) {
            podcast.setPubDate(this.value);
        } else if (qName.equalsIgnoreCase("description")) {
            podcast.setContent(this.value);
        } else if (qName.equalsIgnoreCase("guid")) {
            this.podcast.setLink(value);
        }
    }

}

@Override
public void startElement(String uri, String localName, String qName,
        Attributes attributes) throws SAXException {
    super.startElement(uri, localName, qName, attributes);

    if (this.startTagValue == null) {
        this.startTagValueFound = true;
    } else if (qName.equalsIgnoreCase("guid")
            && this.value.equalsIgnoreCase(this.startTagValue)) {
        this.startTagValueFound = true;
    }
    if (this.endTagValue != null) {
        if (qName.equalsIgnoreCase("guid")
                && this.value.equalsIgnoreCase(this.endTagValue)) {
            this.endDoc = true;
        }
    }
    if (!this.endDoc) {
        if (this.quantityToFetch != this.podcasts.size()) {
            if (this.startTagValueFound == true) {
                if (qName.equalsIgnoreCase("item")) {
                    this.podcast = new Podcast();
                } else if (qName.equalsIgnoreCase("enclosure")) {
                    this.podcast.setMedia(attributes.getValue("url"));
                    this.podcasts.add(podcast);
                }
            }
        } else {
            this.podcast = null;
        }
    }else{
        this.podcast = null;
      }
    }
  }

Solution

You can't rely on the characters method being called once with the entire element text, it may be called multiple times, each time with only part of the text.

Add a debug log statement to the characters method showing what you're setting value to and you will see that values is getting set with the first part of the string and then getting overwritten with the last part.

The answer is to buffer the text passed in from the characters calls in a CharArrayWriter or StringBuilder. Then you have to clear the buffer when the end of the element is found.

Here's what the Java tutorial on SAX has to say about the characters method:

Parsers are not required to return any particular number of characters at one time. A parser can return anything from a single character at a time up to several thousand and still be a standard-conforming implementation. So if your application needs to process the characters it sees, it is wise to have the characters() method accumulate the characters in a java.lang.StringBuffer and operate on them only when you are sure that all of them have been found.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow