Java DOM Parser : XMI - Inline Tag?

https://stackoverflow.com/questions/11715367

23-06-2021
|

Pregunta

I am currently trying to write a simple parser for some XMI files (Generated from a UML diagram) but I am encountering some problems when I try to extract the target xmi.idref from this code snippet (I want to retrieve connected elements to a given activity, I have successfully retrieved all the incoming/outgoing edges) :

<UML2:ActivityEdge xmi.id = 'I6bf577d1m1387a6c0ea1mm7dcb' visibility = 'public' is Specification = 'false'>
    <UML2:ActivityEdge.target>
        <UML2:CallAction xmi.idref = 'I6bf577d1m1387a6c0ea1mm7dda'/>
    </UML2:ActivityEdge.target>

My problem is that when I try to extract the UML2:CallAction, my program does not detect it as an element node but rather as a text node, which is by the way empty. Here is a sample of what I do :

Element edge = searchById(doc,"UML2:ActivityEdge",id);
        Element group = (Element) edge.getElementsByTagName("UML2:ActivityEdge.target").item(0);
        Node target = group.getChildNodes().item(0);
        Element targetRef = (Element) target;
        Element t = searchById(doc,targetRef.getNodeName(),targetRef.getAttribute("xmi.idref"));
        nameList.add(t.getAttribute("name"));

The searchById method is working (I use it in various parts of my code) but if you think it might be the problem, I'll post it. Note that I use getChildNodes rather than getElementsByTagName because the target of this edge might not always be an activity (an XOR join/Merge Node for example). The exact error is :

com.sun.org.apache.xerces.internal.dom.DeferredTextImpl cannot be cast to org.w3c.dom.Element

When I try to cast 'target' to Element... I guess this comes from the fact that it is a "inline" tag but I have no idea how to treat it, being a beginner at parsing...

Thanks for your help,

Herve

Edit : I tried by replacing the getChildNodes by getElementsByTagName and it seems to work... However, if someone could correct the above code or at least explain why it won't work properly, that would be awesome.

Solución

In short, you're making a bad assumption that getChildNodes() returns only XML Elements; it doesn't, it returns other kinds of nodes as well, including text nodes that represent the whitespace and newlines between the elements you're interested in.

If you want to call getChildNodes() and process all the nodes, then you need to loop over all the returned nodes, and look at each one to determine what sort of node it is, and process it accordingly. If you don't want to do this, then something like getElementsByTagName() is the alternative.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow