Question

I'm writing an XML file, and the tabbing is coming out slightly wrong :

<BusinessEvents>

<MailEvent>
          <to>Wellington</to>
          <weight>10.0</weight>
          <priority>air priority</priority>
          <volume>10.0</volume>
          <from>Christchurch</from>
          <day>Mon May 20 14:30:08 NZST 2013</day>
          <PPW>8.0</PPW>
          <PPV>2.5</PPV>
     </MailEvent>
<DiscontinueEvent>
          <to>Wellington</to>
          <priority>air priority</priority>
          <company>Kiwi Co</company>
          <from>Sydney</from>
     </DiscontinueEvent>
<RoutePriceUpdateEvent>
          <weightcost>3.0</weightcost>
          <to>Wellington</to>
          <duration>15.0</duration>
          <maxweight>40.0</maxweight>
          <maxvolume>20.0</maxvolume>
          <priority>air priority</priority>
          <company>Kiwi Co</company>
          <day>Mon May 20 14:30:08 NZST 2013</day>
          <frequency>3.0</frequency>
          <from>Wellington</from>
          <volumecost>2.0</volumecost>
     </RoutePriceUpdateEvent>
<CustomerPriceUpdateEvent>
          <weightcost>3.0</weightcost>
          <to>Wellington</to>
          <priority>air priority</priority>
          <from>Sydney</from>
          <volumecost>2.0</volumecost>
     </CustomerPriceUpdateEvent>
</BusinessEvents>

As you can see, the first child node is not indented at all, but that nodes child is indented twice? and then the close tag is only indented once?

I suspect it might have to do with adding the root not to the document through doc.appendChild(root), but when I do that then I get an error

"An attempt was made to insert a node where it is not permitted. "

Here is my parser:

DocumentBuilderFactory icFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder icBuilder;
        try {
            icBuilder = icFactory.newDocumentBuilder();
            String businessEventsFile = System.getProperty("user.dir") + "/testdata/businessevents/businessevents.xml";
            Document doc = icBuilder.parse (businessEventsFile);

            Element root = doc.getDocumentElement();

            Element element;

            if(event instanceof CustomerPriceUpdateEvent){
                element = doc.createElement("CustomerPriceUpdateEvent");
            }
            else if(event instanceof DiscontinueEvent){
                element = doc.createElement("DiscontinueEvent");
            }
            else if(event instanceof MailEvent){
                element = doc.createElement("MailEvent");
            }
            else if(event instanceof RoutePriceUpdateEvent){
                element = doc.createElement("RoutePriceUpdateEvent");
            }
            else{
                throw new Exception("business event isnt valid");
            }

            for(Map.Entry<String, String> field : event.getFields().entrySet()){
                Element newElement = doc.createElement(field.getKey());
                newElement.appendChild(doc.createTextNode(field.getValue()));
                element.appendChild(newElement);
            }

            root.appendChild(element);


            // output DOM XML to console
            Transformer transformer = TransformerFactory.newInstance().newTransformer();
//            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "5");
            DOMSource source = new DOMSource(doc);
            StreamResult console = new StreamResult(businessEventsFile);
            transformer.transform(source, console);

Any insight would be appreciated.

Was it helpful?

Solution

I had the same problem a while ago. I found out that the problem was that the parsed document included white space as text nodes all over the document.

For example, after parsing the document, you probably have a blank text node right before the <MailEvent> node under the <BusinessEvents> node. The Transformer keeps blank text nodes (which I assume is correct behaviour).

So, if there is no space at all between the tags in the xml text, the Transformer correctly indents the tags. You could try this with your code by manually deleting all whitespace, including line breaks, from your input, and then do a format. The output would then probably be more what you would expect.

One way to solve this is to remove redundant whitespace from the document after it has been parsed. Simply removing all blank text nodes will make the formatting look better, but the problem is if some of the blank text nodes are actually needed.

So what I did to clean up the document before formatting was to remove all text nodes containing only whitespace, except for those cases where the text node were the only child (no siblings).

The method cleanEmptyTextNodes(Node parentNode) below recursively removes all blank text nodes from a subtree.

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.StringWriter;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;

public class FormatXml {

    public static void main(String[] args) throws ParserConfigurationException,
            FileNotFoundException, SAXException, IOException,
            TransformerException {
        DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
                .newInstance();
        DocumentBuilder documentBuilder = docBuilderFactory
                .newDocumentBuilder();
        Document node = documentBuilder.parse(new FileInputStream("data.xml"));
        System.out.println(format(node, 4));
    }

    public static String format(Node node, int indent)
            throws TransformerException {
        cleanEmptyTextNodes(node);
        StreamResult result = new StreamResult(new StringWriter());
        getTransformer(indent).transform(new DOMSource(node), result);
        return result.getWriter().toString();
    }

    private static Transformer getTransformer(int indent) {
        Transformer transformer;
        try {
            transformer = TransformerFactory.newInstance().newTransformer();
        } catch (Exception e) {
            throw new RuntimeException("Failed to create the Transformer", e);
        }
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty(
                "{http://xml.apache.org/xslt}indent-amount",
                Integer.toString(indent));
        return transformer;
    }

    /**
     * Removes text nodes that only contains whitespace. The conditions for
     * removing text nodes, besides only containing whitespace, are: If the
     * parent node has at least one child of any of the following types, all
     * whitespace-only text-node children will be removed: - ELEMENT child -
     * CDATA child - COMMENT child
     * 
     * The purpose of this is to make the format() method (that use a
     * Transformer for formatting) more consistent regarding indenting and line
     * breaks.
     */
    private static void cleanEmptyTextNodes(Node parentNode) {
        boolean removeEmptyTextNodes = false;
        Node childNode = parentNode.getFirstChild();
        while (childNode != null) {
            removeEmptyTextNodes |= checkNodeTypes(childNode);
            childNode = childNode.getNextSibling();
        }

        if (removeEmptyTextNodes) {
            removeEmptyTextNodes(parentNode);
        }
    }

    private static void removeEmptyTextNodes(Node parentNode) {
        Node childNode = parentNode.getFirstChild();
        while (childNode != null) {
            // grab the "nextSibling" before the child node is removed
            Node nextChild = childNode.getNextSibling();

            short nodeType = childNode.getNodeType();
            if (nodeType == Node.TEXT_NODE) {
                boolean containsOnlyWhitespace = childNode.getNodeValue()
                        .trim().isEmpty();
                if (containsOnlyWhitespace) {
                    parentNode.removeChild(childNode);
                }
            }
            childNode = nextChild;
        }
    }

    private static boolean checkNodeTypes(Node childNode) {
        short nodeType = childNode.getNodeType();

        if (nodeType == Node.ELEMENT_NODE) {
            cleanEmptyTextNodes(childNode); // recurse into subtree
        }

        if (nodeType == Node.ELEMENT_NODE
                || nodeType == Node.CDATA_SECTION_NODE
                || nodeType == Node.COMMENT_NODE) {
            return true;
        } else {
            return false;
        }
    }

}

The resulting formatted output with your input:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<BusinessEvents>
    <MailEvent>
        <to>Wellington</to>
        <weight>10.0</weight>
        <priority>air priority</priority>
        <volume>10.0</volume>
        <from>Christchurch</from>
        <day>Mon May 20 14:30:08 NZST 2013</day>
        <PPW>8.0</PPW>
        <PPV>2.5</PPV>
    </MailEvent>
    <DiscontinueEvent>
        <to>Wellington</to>
        <priority>air priority</priority>
        <company>Kiwi Co</company>
        <from>Sydney</from>
    </DiscontinueEvent>
    <RoutePriceUpdateEvent>
        <weightcost>3.0</weightcost>
        <to>Wellington</to>
        <duration>15.0</duration>
        <maxweight>40.0</maxweight>
        <maxvolume>20.0</maxvolume>
        <priority>air priority</priority>
        <company>Kiwi Co</company>
        <day>Mon May 20 14:30:08 NZST 2013</day>
        <frequency>3.0</frequency>
        <from>Wellington</from>
        <volumecost>2.0</volumecost>
    </RoutePriceUpdateEvent>
    <CustomerPriceUpdateEvent>
        <weightcost>3.0</weightcost>
        <to>Wellington</to>
        <priority>air priority</priority>
        <from>Sydney</from>
        <volumecost>2.0</volumecost>
    </CustomerPriceUpdateEvent>
</BusinessEvents>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top