XMLStreamException: ParseError at [row,col]:[5,3] Message: The element type "meta" must be terminated by the matching end-tag "</meta>"

https://stackoverflow.com/questions/23489444

java
parsing

16-07-2023
|

Question

I am trying to display a text from a news in any category of a newspaper. I use RSS of this newspaper. However when I run the code, sometimes I get the exception message in the above, sometimes it works correctly. Here is my RSS Parser code:

And the rss page that I use is:

RSSFeedParser parser = new RSSFeedParser("http://www.cumhuriyet.com.tr/rss/5");
Feed feed = parser.readFeed();

The RSS parser code:

package main;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;

import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.Characters;
import javax.xml.stream.events.XMLEvent;


public class RSSFeedParser {
  static final String TITLE = "title";
  static final String DESCRIPTION = "description";
  static final String CHANNEL = "channel";
  static final String LANGUAGE = "language";
  static final String COPYRIGHT = "copyright";
  static final String LINK = "link";
  static final String AUTHOR = "author";
  static final String ITEM = "item";
  static final String PUB_DATE = "pubDate";
  static final String GUID = "guid";
  static final String IMG = "img";

  final URL url;

  public RSSFeedParser(String feedUrl) {
    try {
      this.url = new URL(feedUrl);
    } catch (MalformedURLException e) {
      throw new RuntimeException(e);
    }
  }

  public Feed readFeed() {
    Feed feed = null;
    try {
      boolean isFeedHeader = true;
      // Set header values intial to the empty string
      String description = "";
      String title = "";
      String link = "";
      String language = "";
      String copyright = "";
      String author = "";
      String pubDate = "";
      String guid = "";
      String img = "";
      // First create a new XMLInputFactory
      XMLInputFactory inputFactory = XMLInputFactory.newInstance();
      // Setup a new eventReader
      InputStream in = read();
      XMLEventReader eventReader = inputFactory.createXMLEventReader(in);
      // read the XML document
      while (eventReader.hasNext()) {
        XMLEvent event = eventReader.nextEvent();
        if (event.isStartElement()) {
          String localPart = event.asStartElement().getName()
              .getLocalPart();
          switch (localPart) {
          case ITEM:
            if (isFeedHeader) {
              isFeedHeader = false;
              feed = new Feed(title, link, description, language,
                  copyright, pubDate);
            }
            event = eventReader.nextEvent();
            break;
          case TITLE:
            title = getCharacterData(event, eventReader);
            break;
          case DESCRIPTION:
            description = getCharacterData(event, eventReader);
            break;
          case LINK:
            link = getCharacterData(event, eventReader);
            break;
          case GUID:
            guid = getCharacterData(event, eventReader);
            break;
          case LANGUAGE:
            language = getCharacterData(event, eventReader);
            break;
          case AUTHOR:
            author = getCharacterData(event, eventReader);
            break;
          case PUB_DATE:
            pubDate = getCharacterData(event, eventReader);
            break;
          case COPYRIGHT:
            copyright = getCharacterData(event, eventReader);
            break;
          }
        } else if (event.isEndElement()) {
          if (event.asEndElement().getName().getLocalPart() == (ITEM)) {
            FeedMessage message = new FeedMessage();

            message.setDescription(description);
            message.setPubDate(pubDate);
            message.setLink(link);
            message.setTitle(title);
            message.setImg(img);
            feed.getMessages().add(message);
            event = eventReader.nextEvent();
            continue;
          }
        }
      }
    } catch (XMLStreamException e) {
      throw new RuntimeException(e);
    }
    return feed;
  }

  private String getCharacterData(XMLEvent event, XMLEventReader eventReader)
      throws XMLStreamException {
    String result = "";
    event = eventReader.nextEvent();
    if (event instanceof Characters) {
      result = event.asCharacters().getData();
    }
    return result;
  }

  private InputStream read() {
    try {
      return url.openStream();
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
  }
}

Exception message is:

Exception in thread "main" java.lang.RuntimeException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[5,3]
Message: The element type "meta" must be terminated by the matching end-tag "</meta>".
    at main.RSSFeedParser.readFeed(RSSFeedParser.java:112)
    at cumhuriyet.Dunya.cumDunya(Dunya.java:32)
    at automation.ServerInteraction.main(ServerInteraction.java:83)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[5,3]
Message: The element type "meta" must be terminated by the matching end-tag "</meta>".
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
    at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
    at main.RSSFeedParser.readFeed(RSSFeedParser.java:58)
    ... 2 more

Solution

That looks like it is a live document; i.e. one that changes fairly frequently. There is also no sign of a tag in it.

I can think of two explanations for what is happening:

Sometimes the document is being generated or created incorrectly.

Sometimes you are getting an HTML error page instead of the document you are expecting, and the XML parser can't cope with a tag in the HTML's .

To track this down, you are going to have to capture the precise input that is causing the parse to fail.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow