XML parsing and usage

https://stackoverflow.com/questions/1813952

06-07-2019
|

Question

I'm building a conforming and validating XML parser in C++ and trying to make it light-weight for use in pocket pc.

At the beginning I decided to add some "events" to my parser like SAX does, informing about elements, processing instructions, etc.

This events are taken by a derived class that builds the DOM tree of the xml.

My doubts appears when trying to handle mainly entities (which can contain elements, pi's and comments inside if defined) and their resolution.

For e.g., I can create a XMLEntityRef class that refers to some XMLEntity defined in some XMLDocType object like .NET system.xml parser does.

As I know, for most purposes an application needs to know an element, its contents, its respective attributes and their respective values... only strings... it doesn't care if the element content is formed by cdata objects, entity references and/or plain text... the same applies to attribute values.

So, my question is the following: What is the benefit of passing to an application each xml object as it appears and letting it (or a helper class) to build, for e.g., the resulting attribute's value by concatenating texts and resolved entity references?

If i'm making a poll, please answer: does your application need to know about cdata tags and where they are located in the xml file, or you make things easy... you want to know the full content value of an element in a string without worrying about how it is builded?

Best regards, Mauro H. Leggieri

Solution

I'm building a conforming and validating XML parser in C++ and trying to make it light-weight

There is no such thing as a light-weight conforming (never mind validating) parser. To be a conforming parser you have to understand all the stuff that can go in a DTD external subset, which is gnarly work indeed. It is a shame that the XML specification ended up weighed down with all the SGML DTD crud, but we are stuck with it now.

does your application need to know about cdata tags and where they are located in the xml file

Normally no. DOM Level 3 LS does require that CDATA sections be kept a CDATASection nodes in the DOM by default, but almost no application cares.

(If the question is about my application then yes, because my application is a templating system that keeps CDATA sections where they were. But still.)

My doubts appears when trying to handle mainly entities

God yes. Entity references are a total disaster. Making a DOM implementation support them in a way which is compliant with DOM Level 3 Core/LS is very very complicated. Avoid if at all possible.

OTHER TIPS

generally xml is not light weight. You are better off with JSON.

When building a parser I do not think you should presume anything about how applications will consume the xml, rather, provide the most granular level of data for each xml node to provide maximum flexibility. While this may require more work on the part of consuming applications, they will be able to accomplish whatever they need to. Good luck.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow