/**
* This method ensures that the output String has only
* valid XML unicode characters as specified by the
* XML 1.0 standard. For reference, please see
* <a href="http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char">the
* standard</a>. This method will return an empty
* String if the input is null or empty.
*
* @param in The String whose non-valid characters we want to remove.
* @return The in String, stripped of non-valid characters.
*/
public String stripNonValidXMLCharacters(String in) {
StringBuffer out = new StringBuffer(); // Used to hold the output.
char current; // Used to reference the current character.
if (in == null || ("".equals(in))) return ""; // vacancy test.
for (int i = 0; i < in.length(); i++) {
current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught here; it should not happen.
if ((current == 0x9) ||
(current == 0xA) ||
(current == 0xD) ||
((current >= 0x20) && (current <= 0xD7FF)) ||
((current >= 0xE000) && (current <= 0xFFFD)) ||
((current >= 0x10000) && (current <= 0x10FFFF)))
out.append(current);
}
return out.toString();
}
Castor Marshaling :: Invalid XML Character
-
11-04-2022 - |
Question
I am using castor API for converting an object to XML.
I get the following exception
Caused by: org.xml.sax.SAXException: The character '' is an invalid XML character.
I know the correct approach is to correct the source but there are a lot of such invalid characters.
In another forum, someone suggested to encode the java object contents before marshaling them and then decode the output (Base64
). The approach appears pretty cumbersome and does not fit the solution properly.
I need a way to skip these characters during marshaling and the XML generated should contain the characters as it is.
Solution
OTHER TIPS
If you want generated XML to contain this kind of
characters as it is
, then XML 1.1 specification might help.
Castor can be configured to marshal into XML 1.1 with custom org.exolab.castor.xml.XMLSerializerFactory
and org.exolab.castor.xml.Serializer
implementations:
package com.foo.castor;
......
import org.exolab.castor.xml.BaseXercesOutputFormat;
import org.exolab.castor.xml.Serializer;
import org.exolab.castor.xml.XMLSerializerFactory;
import org.xml.sax.DocumentHandler;
import com.sun.org.apache.xml.internal.serialize.OutputFormat;
import com.sun.org.apache.xml.internal.serialize.XML11Serializer;
@SuppressWarnings("deprecation")
public class CastorXml11SerializerFactory implements XMLSerializerFactory {
private static class CastorXml11OutputFormat extends BaseXercesOutputFormat{
public CastorXml11OutputFormat(){
super._outputFormat = new OutputFormat();
}
}
private static class CastorXml11Serializer implements Serializer {
private XML11Serializer serializer = new XML11Serializer();
@Override
public void setOutputCharStream(Writer out) {
serializer.setOutputCharStream(out);
}
@Override
public DocumentHandler asDocumentHandler() throws IOException {
return serializer.asDocumentHandler();
}
@Override
public void setOutputFormat(org.exolab.castor.xml.OutputFormat format) {
serializer.setOutputFormat((OutputFormat)format.getFormat());
}
@Override
public void setOutputByteStream(OutputStream output) {
serializer.setOutputByteStream(output);
}
}
@Override
public Serializer getSerializer() {
return new CastorXml11Serializer();
}
@Override
public org.exolab.castor.xml.OutputFormat getOutputFormat() {
return new CastorXml11OutputFormat();
}
}
in castor.properties
file globally
org.exolab.castor.xml.serializer.factory=com.foo.castor.CastorXml11SerializerFactory
org.exolab.castor.xml.version=1.1
or set these two properties by setCastorProperties
method of your particular CastorMarshaller
.
Please be advised, however, that XML 1.1 is not accepted by browsers and not all XML parsers can parse XML 1.1 out of the box.