Question

I am using castor API for converting an object to XML.

I get the following exception

Caused by: org.xml.sax.SAXException: The character '' is an invalid XML character.

I know the correct approach is to correct the source but there are a lot of such invalid characters.

In another forum, someone suggested to encode the java object contents before marshaling them and then decode the output (Base64). The approach appears pretty cumbersome and does not fit the solution properly.

I need a way to skip these characters during marshaling and the XML generated should contain the characters as it is.

Was it helpful?

Solution

 /**
     * This method ensures that the output String has only
     * valid XML unicode characters as specified by the
     * XML 1.0 standard. For reference, please see
     * <a href="http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char">the
     * standard</a>. This method will return an empty
     * String if the input is null or empty.
     *
     * @param in The String whose non-valid characters we want to remove.
     * @return The in String, stripped of non-valid characters.
     */
    public String stripNonValidXMLCharacters(String in) {
        StringBuffer out = new StringBuffer(); // Used to hold the output.
        char current; // Used to reference the current character.

        if (in == null || ("".equals(in))) return ""; // vacancy test.
        for (int i = 0; i < in.length(); i++) {
            current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught here; it should not happen.
            if ((current == 0x9) ||
                (current == 0xA) ||
                (current == 0xD) ||
                ((current >= 0x20) && (current <= 0xD7FF)) ||
                ((current >= 0xE000) && (current <= 0xFFFD)) ||
                ((current >= 0x10000) && (current <= 0x10FFFF)))
                out.append(current);
        }
        return out.toString();
    }  

OTHER TIPS

If you want generated XML to contain this kind of

characters as it is

, then XML 1.1 specification might help. Castor can be configured to marshal into XML 1.1 with custom org.exolab.castor.xml.XMLSerializerFactory and org.exolab.castor.xml.Serializer implementations:

package com.foo.castor;
......

import org.exolab.castor.xml.BaseXercesOutputFormat;
import org.exolab.castor.xml.Serializer;
import org.exolab.castor.xml.XMLSerializerFactory;
import org.xml.sax.DocumentHandler;

import com.sun.org.apache.xml.internal.serialize.OutputFormat;
import com.sun.org.apache.xml.internal.serialize.XML11Serializer;

@SuppressWarnings("deprecation")
public class CastorXml11SerializerFactory implements XMLSerializerFactory {

    private static class CastorXml11OutputFormat extends BaseXercesOutputFormat{

        public CastorXml11OutputFormat(){
            super._outputFormat = new OutputFormat();
        }
    }

    private static class CastorXml11Serializer implements Serializer {

        private XML11Serializer serializer = new XML11Serializer();

        @Override
        public void setOutputCharStream(Writer out) {
            serializer.setOutputCharStream(out);
        }

        @Override
        public DocumentHandler asDocumentHandler() throws IOException {
            return serializer.asDocumentHandler();
        }

        @Override
        public void setOutputFormat(org.exolab.castor.xml.OutputFormat format) {
            serializer.setOutputFormat((OutputFormat)format.getFormat());
        }

        @Override
        public void setOutputByteStream(OutputStream output) {
            serializer.setOutputByteStream(output);
        }

    }

    @Override
    public Serializer getSerializer() {
        return new CastorXml11Serializer();
    }

    @Override
    public org.exolab.castor.xml.OutputFormat getOutputFormat() {
        return new CastorXml11OutputFormat();
    }

}

in castor.properties file globally

org.exolab.castor.xml.serializer.factory=com.foo.castor.CastorXml11SerializerFactory
org.exolab.castor.xml.version=1.1

or set these two properties by setCastorProperties method of your particular CastorMarshaller.

Please be advised, however, that XML 1.1 is not accepted by browsers and not all XML parsers can parse XML 1.1 out of the box.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top