Question

I have an Object that is being marshalled to XML using JAXB. One element contains a String that includes quotes ("). The resulting XML has " where the " existed.

Even though this is normally preferred, I need my output to match a legacy system. How do I force JAXB to NOT convert the HTML entities?

--

Thank you for the replies. However, I never see the handler escape() called. Can you take a look and see what I'm doing wrong? Thanks!

package org.dc.model;

import java.io.IOException;
import java.io.Writer;

import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Marshaller;

import org.dc.generated.Shiporder;

import com.sun.xml.internal.bind.marshaller.CharacterEscapeHandler;

public class PleaseWork {
    public void prettyPlease() throws JAXBException {
        Shiporder shipOrder = new Shiporder();
        shipOrder.setOrderid("Order's ID");
        shipOrder.setOrderperson("The woman said, \"How ya doin & stuff?\"");

        JAXBContext context = JAXBContext.newInstance("org.dc.generated");
        Marshaller marshaller = context.createMarshaller();
        marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
        marshaller.setProperty(CharacterEscapeHandler.class.getName(),
                new CharacterEscapeHandler() {
                    @Override
                    public void escape(char[] ch, int start, int length,
                            boolean isAttVal, Writer out) throws IOException {
                        out.write("Called escape for characters = " + ch.toString());
                    }
                });
        marshaller.marshal(shipOrder, System.out);
    }

    public static void main(String[] args) throws Exception {
        new PleaseWork().prettyPlease();
    }
}

--

The output is this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<shiporder orderid="Order's ID">
    <orderperson>The woman said, &quot;How ya doin &amp; stuff?&quot;</orderperson>
</shiporder>

and as you can see, the callback is never displayed. (Once I get the callback being called, I'll worry about having it actually do what I want.)

--

Was it helpful?

Solution

Solution my teammate found:

PrintWriter printWriter = new PrintWriter(new FileWriter(xmlFile));
DataWriter dataWriter = new DataWriter(printWriter, "UTF-8", DumbEscapeHandler.theInstance);
marshaller.marshal(request, dataWriter);

Instead of passing the xmlFile to marshal(), pass the DataWriter which knows both the encoding and an appropriate escape handler, if any.

Note: Since DataWriter and DumbEscapeHandler are both within the com.sun.xml.internal.bind.marshaller package, you must bootstrap javac.

OTHER TIPS

I have just made my custom handler as a class like this:

import java.io.IOException;
import java.io.StringWriter;
import java.io.Writer;

import com.sun.xml.bind.marshaller.CharacterEscapeHandler;

public class XmlCharacterHandler implements CharacterEscapeHandler {

    public void escape(char[] buf, int start, int len, boolean isAttValue,
            Writer out) throws IOException {
        StringWriter buffer = new StringWriter();

        for (int i = start; i < start + len; i++) {
            buffer.write(buf[i]);
        }

        String st = buffer.toString();

        if (!st.contains("CDATA")) {
            st = buffer.toString().replace("&", "&amp;").replace("<", "&lt;")
                .replace(">", "&gt;").replace("'", "&apos;")
                .replace("\"", "&quot;");

        }
        out.write(st);
        System.out.println(st);
    }

}

in the marshaller method simply call:

marshaller.setProperty(CharacterEscapeHandler.class.getName(),
                new XmlCharacterHandler());

it works fine.

I've been playing with your example a bit and debugging the JAXB code. And it seems it's something specific about UTF-8 encoding used. The escapeHandler property of MarshallerImpl seems to be set properly. However it's being used not in every context. If I searched for calls of MarshallerImpl.createEscapeHandler() I found:

public XmlOutput createWriter( OutputStream os, String encoding ) throws JAXBException {
    // UTF8XmlOutput does buffering on its own, and
    // otherwise createWriter(Writer) inserts a buffering,
    // so no point in doing a buffering here.

    if(encoding.equals("UTF-8")) {
        Encoded[] table = context.getUTF8NameTable();
        final UTF8XmlOutput out;
        if(isFormattedOutput())
            out = new IndentingUTF8XmlOutput(os,indent,table);
        else {
            if(c14nSupport)
                out = new C14nXmlOutput(os,table,context.c14nSupport);
            else
                out = new UTF8XmlOutput(os,table);
        }
        if(header!=null)
            out.setHeader(header);
        return out;
    }

    try {
        return createWriter(
            new OutputStreamWriter(os,getJavaEncoding(encoding)),
            encoding );
    } catch( UnsupportedEncodingException e ) {
        throw new MarshalException(
            Messages.UNSUPPORTED_ENCODING.format(encoding),
            e );
    }
}

Note that in your setup the top section (...equals("UTF-8")...) is taken into consideration. However this one doesn't take the escapeHandler. However if you set the encoding to any other, the bottom part of this method is called (createWriter(OutputStream, String)) and this one uses escapeHandler, so EH plays its role. So, adding...

    marshaller.setProperty(Marshaller.JAXB_ENCODING, "ASCII");

makes your custom CharacterEscapeHandler be called. Not really sure, but I would guess this is kind of bug in JAXB.

@Elliot you can use this in order to enable marshaller to enter characterEscape function. It is wierd but it works if you set "Unicode" instead of "UTF-8". Add this just before or after you set CharacterEscapeHandler property.

marshaller.setProperty(Marshaller.JAXB_ENCODING, "Unicode");

However don't be sure just only by checking your console within your IDE, because it should be shown depend on the workspace encoding. It is better to check it also from a file like that:

marshaller.marshal(shipOrder, new File("C:\\shipOrder.txt"));

I would say that easiest way to do is by overriding CharacterEscapeHandler :

marshaller.setProperty("com.sun.xml.bind.characterEscapeHandler", new CharacterEscapeHandler() {
    @Override
    public void escape(char[] ch, int start, int length, boolean isAttVal,
                       Writer out) throws IOException {
        out.write(ch, start, length);
    }
});

i found same issue i fixed this using xmlWriter in xmlWriter file there is one method isEscapeText() and setEscapeTest that is by default true if you dont want transformation between < to &lt that time you need to setEscapeTest(false); during marshalling

JAXBContext jaxbContext = JAXBContext.newInstance(your class);
Marshaller marshaller = jaxbContext.createMarshaller();

marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);

// Create a filter that will remove the xmlns attribute
NamespaceFilter outFilter = new NamespaceFilter(null, false);

// Do some formatting, this is obviously optional and may effect
// performance
OutputFormat format = new OutputFormat();
format.setIndent(true);
format.setNewlines(true);

// Create a new org.dom4j.io.XMLWriter that will serve as the
// ContentHandler for our filter.
XMLWriter writer = new XMLWriter(new FileOutputStream(file), format);
writer.setEscapeText(false); // <----------------- this line
// Attach the writer to the filter
outFilter.setContentHandler(writer);
// marshalling
marshaller.marshal(piaDto, outFilter);
marshaller.marshal(piaDto, System.out);

this change writer.setEscapeText(false); fixed my issue hope this changes helpful to you

Seems like it is possible with Sun's JAXB implementation, although I've not done it myself.

I checked the XML specification. http://www.w3.org/TR/REC-xml/#sec-references says "well-formed documents need not declare any of the following entities: amp, lt, gt, apos, quot. " so it appears that the XML parser used by the legacy system is not conformant.

(I know that it does not solve your problem, but it is at least nice to be able to say which component is broken).

This works for me after reading other posts:

javax.xml.bind.JAXBContext jc = javax.xml.bind.JAXBContext.newInstance(object);
marshaller = jc.createMarshaller();         marshaller.setProperty(javax.xml.bind.Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.setProperty(javax.xml.bind.Marshaller.JAXB_ENCODING, "UTF-8");                   marshaller.setProperty(CharacterEscapeHandler.class.getName(), new CustomCharacterEscapeHandler());


public static class CustomCharacterEscapeHandler implements CharacterEscapeHandler {
        /**
         * Escape characters inside the buffer and send the output to the Writer.
         * (prevent <b> to be converted &lt;b&gt; but still ok for a<5.)
         */
        public void escape(char[] buf, int start, int len, boolean isAttValue, Writer out) throws IOException {
            if (buf != null){
                StringBuilder sb = new StringBuilder();
                for (int i = start; i < start + len; i++) {
                    char ch = buf[i];

                    //by adding these, it prevent the problem happened when unmarshalling
                    if (ch == '&') {
                        sb.append("&amp;");
                        continue;
                    }

                    if (ch == '"' && isAttValue) {
                        sb.append("&quot;");
                        continue;
                    }

                    if (ch == '\'' && isAttValue) {
                        sb.append("&apos;");
                        continue;
                    }


                    // otherwise print normally
                    sb.append(ch);
                }

                //Make corrections of unintended changes
                String st = sb.toString();

                st = st.replace("&amp;quot;", "&quot;")
                       .replace("&amp;lt;", "&lt;")
                       .replace("&amp;gt;", "&gt;")
                       .replace("&amp;apos;", "&apos;")
                       .replace("&amp;amp;", "&amp;");

                out.write(st);
            }
        }
    }

interesting but with strings you can try out

Marshaller marshaller = jaxbContext.createMarshaller();
StringWriter sw = new StringWriter();
marshaller.marshal(data, sw);
sw.toString();

at least for me this do not escape quotes

The simplest way, when using sun's Marshaller implementation is to provide your own implementation of the CharacterEscapeEncoder which does not escape anything.

    Marshaller m = jcb.createMarshaller();
m.setProperty(
    "com.sun.xml.bind.marshaller.CharacterEscapeHandler",
    new NullCharacterEscapeHandler());

With

public class NullCharacterEscapeHandler implements CharacterEscapeHandler {

    public NullCharacterEscapeHandler() {
        super();
    }


    public void escape(char[] ch, int start, int length, boolean isAttVal, Writer writer) throws IOException {
        writer.write( ch, start, length );
    }
}

For some reason I have no time to find out, it worked for me when setting

marshaller.setProperty(Marshaller.JAXB_ENCODING, "utf-8");

As opposed to using "UTF-8" or "Unicode"

I suggest you try them, and as @Javatar said, check them dumping to file using:

marshaller.marshal(shipOrder, new File("<test_file_path>"));

and opening it with a a decent text editor like notepad++

I would advise against using CharacterEscapeHandler for the reasons mentioned above (it's an internal class). Instead you can use Woodstox and supply your own EscapingWriterFactory to a XMLStreamWriter. Something like:

XMLOutputFactory2 xmlOutputFactory = (XMLOutputFactory2)XMLOutputFactory.newFactory();
xmlOutputFactory.setProperty(XMLOutputFactory2.P_TEXT_ESCAPER, new EscapingWriterFactory() {

    @Override
    public Writer createEscapingWriterFor(Writer w, String enc) {
        return new EscapingWriter(w);
    }

    @Override
    public Writer createEscapingWriterFor(OutputStream out, String enc) throws UnsupportedEncodingException {
        return new EscapingWriter(new OutputStreamWriter(out, enc));
    }

});

marshaller.marshal(model, xmlOutputFactory.createXMLStreamWriter(out);

An example of how to write an EscapingWriter can be seen in CharacterEscapingTest.

After trying all the above solutions, finally came to the conclusion.

your marshaling logic through the custom escape handler.

final StringWriter sw = new StringWriter();
    final Class classType = fixml.getClass();
    final JAXBContext jaxbContext = JAXBContext.newInstance(classType);
    final Marshaller marshaller = jaxbContext.createMarshaller();
    final JAXBElement<T> fixmsg = new JAXBElement<T>(new QName(namespaceURI, localPart), classType, fixml);
    marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
    marshaller.setProperty(CharacterEscapeHandler.class.getName(), new JaxbCharacterEscapeHandler());
    marshaller.marshal(fixmsg, sw);
    return sw.toString();

And the custom escape handler is as follow:

import java.io.IOException;
import java.io.Writer;

public class JaxbCharacterEscapeHandler implements CharacterEscapeHandler {

    public void escape(char[] buf, int start, int len, boolean isAttValue,
                    Writer out) throws IOException {

            for (int i = start; i < start + len; i++) {
                    char ch = buf[i];
                    out.write(ch);
            }
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top