Domanda

this question refers to an algorithm we have not been able to find it. The problem is the following:

We have an XML (to be specific an ODT File containing a content.xml) containing a depth-aligned structure:

<root xmlns:text="someuri">
<header>
...
</header>
<body>
    <text:span dns:att01="value">
        some text
    </text:span>
    <text:span dns:att02="value">
        more text
        <text:span dns:att03="value">
            even nested structures
        </text:span>
    </text:span>
</body>
</root>

Please note, that this is a simplified example containing only the necessary details. As you can see, this looks like a 'normal' xml structure with a root containing some text and span nodes. For our application we need to do some processing. As all the span nodes contain other nodes, forming a tree like structure, the target format needs to be converted to have the text nodes breadth-aligned. This is the desired format:

<root xmlns:text="someuri">
<header>
...
</header>
<body>
    <text:marker-begin text:name="01" />
        some text
    <text:marker-end text:name="01" />
    <text:marker text:name="01" />

    <text:marker-begin text:name="02" />
        more text
        <text:marker-begin text:name="03" />
            even nested structures
        <text:marker-end text:name="03" />
        <text:marker text:name="03" />
    <text:marker-end text:name="02" />
    <text:marker text:name="02" />

</body>
</root>

Don't let the indentation irritate you, that all text nodes might have a direct parent except the body node. The marker is used to trigger a certain feature from a third party software. The desired text notes are now surounded by empty elements implicating a marking mechanism. Now, after some verbose preparation, the question itself:

How would you transform structure one into structure two using the default DOM mechanisms available via java. Is this even possible? Would you rather suggest a SAX approach to collect the start and ending elements of a span node? Does an algorithm already exists for this problem? XLST is not possible due to a side-processing chain, that must be done during the process.

È stato utile?

Soluzione

We found a solution by using a dirty trick:

we have a breadth-first implementation of a traverser (using the TreeWalker doesn't make any sense here) delegate the desired operation to a processing function:

// local field
Queue queue;

void traverse()
{
    queue = new LinkedListed();
    queue.add(documentRoot);

    queue.add(root);
    while (!queue.isEmpty()) 
    {
        current     = queue.poll();
        children    = current.getChildNodes();

        // the delegate
        process(current);

        for (int i = 0; i < children.getLength(); i++) 
        {
            child = children.item(i);
            switch(child.getNodeType())
            {
            case Node.ELEMENT_NODE:
            case Node.TEXT_NODE:
                queue.add(child);
                break;
            }
        } // end iteration over childnodes
    }
}

and here is the processing function:

void process(Node node)
{
            String name                     = node.getNodeName();
        Map<String, String> attributes      = XMLUtil.extractAttributes(node);

        // this is basically the current node, but we need to save it as
        // extra reference to copy all child elements from it, to further process
        // the document tree
        Node target = null;
        Node next   = null;
        Node parent = node.getParentNode();

        if(name.equals("text:" + TARGET_ELEMENT)) {
            // deep copy
            target = node.cloneNode(true);

            // create the three relevant bookmark nodes
            Node bkMrkStart = document.createElement("bookmark-begin");
            Node bkMrkEnd   = document.createElement("bookmark-end");
            Node bkMrkRsd   = document.createElement("bookmark");

            // insert bookmark start
            node.getParentNode().insertBefore(bkMrkStart, node);

            // get next sibling or null, if last elment
            next = node.getNextSibling();

            // insert ending bookmark and 'residue'
            parent.insertBefore(bkMrkRsd, next);
            parent.insertBefore(bkMrkEnd, bkMrkRsd);

            // create new its tag element
            AuxiliaryElement nextAux = createAuxiliary(attributes);


            // apply generated id to created bookmarks
            XMLUtil.setAttribute(
                    "text:span", 
                    "id-[" + nextAux.getId().getString() + "]", 
                    bkMrkStart, bkMrkEnd, bkMrkRsd);


            NodeList children = target.getChildNodes();

            int index = 0;
            do
            {
                Node child = children.item(index).cloneNode(true);
                // it seems necessary to extra save this element
                            // for further processing
                queue.add(child);

                parent.insertBefore(child, bkMrkEnd);
            } while(++index < children.getLength());

            // delete old element
            parent.removeChild(node);

            // clear target
            target = null;
        }
}

It looks like #removeChild or #insertBefore is not being reflected by the traversal. It might be, that this is due to our own implementation of a breadth first traverser. However using this approach as described above gives the desired result.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top