How should I strip invalid XML characters from a stream in J2ME? org.xml.sax.SAXParseException: Invalid character

StackOverflow https://stackoverflow.com/questions/844599

Question

This code is running on Blackberry JDE v4.2.1 It's in a method that makes web API calls that return XML. Sometimes, the XML returned is not well formed and I need to strip out any invalid characters prior to parse.

Currently, I get: org.xml.sax.SAXParseException: Invalid character '' encountered.

I would like to see ideas of a fast way to attach an invalid character stripper on the input stream so that the stream just flows through the validator/stripper and into the parse call. i.e. I'm trying to avoid saving the content of the stream.

Existing code:

handler is an override of DefaultHandler
url is a String containing the API URL

hconn = (HttpConnection) Connector.open(url,Connector.READ_WRITE,true);

...

try{
   XMLParser parser = new XMLParser();
   InputStream input = hconn.openInputStream();
   parser.parse(input, handler);
   input.close();
} catch (SAXException e) {
   Logger.getInstance().error("getViaHTTP() - SAXException - "+e.toString());
}
Was it helpful?

Solution

It's difficult to attach a stripper on the InputStream because streams are byte-oriented. It might make more sense to do it on a Reader. You could make something like a StripReader that wraps a another reader and deals with errors. Below is a quick, untested, proof of concept for this:

public class StripReader extends Reader
{
    private Reader in;
    public StripReader(Reader in)
    {
    this.in = in;
    }

    public boolean markSupported()
    {
    return false;
    }

    public void mark(int readLimit)
    {
    throw new UnsupportedOperationException("Mark not supported");
    }

    public void reset()
    {
    throw new UnsupportedOperationException("Reset not supported");
    }

    public int read() throws IOException
    {
    int next;
    do
    {
        next = in.read();
    } while(!(next == -1 || Character.isValidCodePoint(next)));

    return next; 
    }

    public void close() throws IOException
    {
    in.close();
    }

    public int read(char[] cbuf, int off, int len) throws IOException
    {
    int i, next = 0;
    for(i = 0; i < len; i++)
    {
        next = read();
        if(next == -1)
        break;
        cbuf[off + i] = (char)next;
    }
    if(i == 0 && next == -1)
        return -1;
    else
        return i;
    }

    public int read(char[] cbuf) throws IOException
    {
    return read(cbuf, 0, cbuf.length);
    }
}

You would then construct an InputSource from then Reader then do the parse using the InputSource.

OTHER TIPS

Use a FilterInputStream. Override FilterInputStream#read to filter the offending bytes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top