Domanda

I am getting exception while trying to parse an XML document.

I went through many post like here and here. But still my problem not got solved. I checked i dont have any spaces too in header.I created it with notepad and i choosed encoding as utf-8 while saving.

My XML file looks like this

<?xml version="1.0" encoding="UTF-8"?>
<poem>
<title>Roses are Red</title>
<l>Roses are red</l>
</poem>

I am using java to load the file and parse it. My java code is

File xml = new File("d:\\uploads\test.xml");
try{  
     XMLReader xr = XMLReaderFactory.createXMLReader();
     MySAXApp handler = new MySAXApp();
     xr.setContentHandler(handler);
     xr.setErrorHandler(handler);
     FileReader r = new FileReader(xml);
     xr.parse(new InputSource(r));
 }
 catch(Exception e)
 {
      log.info("Exception : "+e.getMessage());
 }

My MySAXApp class is below

package utility;
import java.io.FileReader;
import java.util.logging.Logger;
import org.xml.sax.XMLReader;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.helpers.XMLReaderFactory;
import org.xml.sax.helpers.DefaultHandler;

public class MySAXApp extends DefaultHandler {

    public Logger log;
    public MySAXApp ()
    {

        super();
        log = Logger.getAnonymousLogger();
    }
    public void startDocument ()
    {
        log.info("Start document");
    }


    public void endDocument ()
    {
        log.info("End document");
    }
    public void startElement (String uri, String name,String qName, Attributes atts)
    {

      log.info("Start element: " + qName);

    }

    public void endElement (String uri, String name, String qName)
    {
         log.info("End element: " + qName);
    }
    public void characters (char ch[], int start, int length)
    {
        log.info("values:    \"");
        for (int i = start; i < start + length; i++) {
            switch (ch[i]) {
            case '\\':
            log.info("\\\\");
            break;
            case '"':
            log.info("\\\"");
            break;
            case '\n':
            log.info("\\n");
            break;
            case '\r':
            log.info("\\r");
            break;
            case '\t':
            log.info("\\t");
            break;
            default:
            log.info(ch[i]+"");
            break;
            }
        }
        log.info("\"\n");
    }
}

Stack trace

org.xml.sax.SAXParseException: Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195)

    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:174)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:388)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1411)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1038)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)

    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)

    at utility.PerformOperation.startIndexing(PerformOperation.java:91)
    at utility.Upload.doPost(Upload.java:126)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
    at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
È stato utile?

Soluzione

This is explained by @MikeSokolov in the first of your links. Quote:

Another thing that often happens is a UTF-8 BOM (byte order mark), which is allowed before the XML declaration can be treated as whitespace if the document is handed as a stream of characters to an XML parser rather than as a stream of bytes.

FileReader reads the file as a character stream, and to read the file as a byte stream, you should use FileInputStream instead, as follows:

FileInputStream is = new FileInputStream(xml);
xr.parse(new InputSource(is));

If you examine your text file in a hex editor you will see the UTF-8 BOM at the start (EF BB BF) and it is this that is causing the problem when using FileReader.

Altri suggerimenti

You can use the following method to read the XML file to a String and then parse the String. The key is specifying the encoding when reading the file.

public static void XMLtoString(File file) {

    String encoding = "";
    String str = "";

    try {
        // detect the encoding of the file
        CharsetDetector cd = new CharsetDetector().setText(new BufferedInputStream(new FileInputStream(file)));
        encoding = cd.detect().getName();

        // to avoid the BOM ("byte order mark") being added to the String, encoding is specified as a parameter
        str = FileUtils.readFileToString(file, encoding);
    }
    catch (IOException e) {
        System.err.println("Caught IOException: " + e.getMessage());
    }
}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top