Question

This is a doubt in SAX. I want to process the children tags in a XML file,only if it matches the parent tag. For ex:

<version>
    <parent tag-1>
       <tag 1>
       <tag 2>
     </parent tag-1 >
     <parent tag-2>
       <tag 1>
       <tag 2>
     </parent tag-2>
</version>

In the above code, I want to match the parent tag first (i.e parent tag-1 or parent tag``-2,based on user input) and only then process the children tags under it. Can this be done in SAX parser, keeping in mind that SAX has limited control over DOM and that I am a novice in both SAX and Java? If so, could you please quote the corresponding method? TIA

Was it helpful?

Solution

Surely, it can be done easily by remembering the parent tag.

In general, when parsing xml tags, people use stack to keep track of the family map of those tags. Your case can be solved easily with the following code:

Stack<Tag> tagStack = new Stack<Tag>();

public void startElement(String uri, String localName, String qName,
        Attributes attributes)
     if(localName.toLowerCase().equals("parent")){
          tagStack.push(new ParentTag());
     }else if(localName.toLowerCase().equals("tag")){
          if(tagStack.peek() instanceof ParentTag){
               //do your things here only when the parent tag is "parent"
          }
     }
}
public void endElement(String uri, String localName, String qName)
        throws SAXException{
     if(localName.toLowerCase().equals("parent")){
          tagStack.pop();
     }
}

Or you can simply remember you are in what tag by updating tagname:

String tagName = null;
public void startElement(String uri, String localName, String qName,
        Attributes attributes)
     if(localName.toLowerCase().equals("parent")){
          tagName = "parent";
     }else if(localName.toLowerCase().equals("tag")){
          if(tagName!= null && tagName.equals("parent")){
               //do your things here only when the parent tag is "parent"
          }
     }
}
public void endElement(String uri, String localName, String qName)
        throws SAXException{
     tagName = null;
}

But I prefer the stack way, because it keeps track of all your ancestor tags.

OTHER TIPS

SAX is going to spool through the entire document anyway, if you're looking at doing this for performance reasons.

However, from a code niceness perspective, you could have the SAX parser not return the non-matching children, by wiring it up with an XMLFilter. You'd probably still have to write the logic yourself - something like that provided in Wing C. Chen's post - but instead of putting it on your application logic you could abstract it out into a filter implementation.

This would let you reuse the filtering logic more easily, and it would probably make your application code cleaner and easier to follow.

The solution proposed by @Wing C. Chen is more than decent, but in your case, I wouldn't use a stack.

A use case for a stack when parsing XML

A common use case for a stack and XML is for example verifying that XML tags are balanced, when using your own lexer(i.e. hand made XML parser with error tolerance).

A concrete example of it would be building the outline of an XML document for the Eclipse IDE.

When to use SAX, Pull parsers and alike

  • Memory efficiency when parsing a huge XML file

  • You don't need to navigate back and forth in the document.

However Using SAX to parse complex documents can become tedious, especially if you want to apply operations to nodes based on some conditions.

When to use DOM like APis

  • You want easy access to the nodes

  • You want to navigate back and forth in the document at any time

  • Speed is not the main requirement vs development time/readability/maintenance

My recommendation

If you don't have a huge XML, use a DOM like API and select the nodes with XPath. I prefer Dom4J personally, but I don't mind other APis such as JDom or even Xpp3 which has XPath support.

The SAX Parser will call a method in your implementation, every time it hits a tag. If you want different behavior depending on the parent, you have to save it to a variable.

If you want to jump to particular tags then you would need to use a DOM parser. This will read the entire document into memory and then provide various ways of accessing particular nodes of the tree, such as requesting a tag by name then asking for the children of that tag.

So if you are not restricted to SAX then I would recommend DOM. I think the main reason for using SAX over DOM is that DOM requires more memory since the entire document is loaded at once.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top