Question

Reading StackOverflow and listening the podcasts by Joel Spolsky and Jeff Atwood, I start to believe that many developers hate using XML or at least try to avoid using XML as much as possible for storing or exchanging data.

On the other hand, I enjoy using XML a lot for several reasons:

  • XML serialization is implemented in most modern languages and is extremely easy to use,
  • Being slower than binary serialization, XML serialization is very useful when it comes to using the same data from several programming languages or where it is intended to be read and understand, even for debugging, by an human (JSON, for example, is more difficult to understand),
  • XML supports unicode, and when used properly, there are no problems with different encoding, characters, etc.
  • There are plenty of tools which makes it easy to work with XML data. XSLT is an example, making it easy to present and to transform data. XPath is another one, making it easy to search for data,
  • XML can be stored in some SQL servers, which enables the scenarios when data which is too complicated to be easily stored in SQL tables must be saved and manipulated; JSON or binary data, for example, cannot be manipulated through SQL directly (except by manipulating strings, which is crazy in most situations),
  • XML does not require any applications to be installed. If I want my app to use a database, I must install a database server first. If I want my app to use XML, I don't have to install anything,
  • XML is much more explicit and extensible than, for example, Windows Registry or INI files,
  • In most cases, there are no CR-LF problems, thanks to the level of abstraction provided by XML.

So, taking in account all the benefits of using XML, why so many developers hate using it? IMHO, the only problem with it is that:

  • XML is too verbose and requires much more place than most other forms of data, especially when it comes to Base64 encoding.

Of course, there are many scenarios where XML doesn't fit at all. Storing questions and answers of SO in an XML file on server side will be absolutely wrong. Or, when storing an AVI video or a bunch of JPG images, XML is the worst thing to use.

But what about other scenarios? What are the weaknesses of XML?


To the people who considered that this question is not a real question:

Contrary to questions like a non-closed Significant new inventions in computing since 1980, my question is a very clear question and clearly invites to explain what weaknesses the other people experience when using XML and why they dislike it. It does not invite to discuss, for example, if XML is good or bad. Neither does it require extended discussions; thus, the current answers received so far are short and precise and provide enough info I wanted.

But it is a wiki, since there cannot be an unique good answer to this question.

According to SO, "not a real question" is a question where "It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, or rhetorical and cannot be reasonably answered in its current form."

  • What is being asked here: I think the question itself is very clear, and several paragraphs of text above makes it even clearer,
  • This question is ambiguous, vague, incomplete: again, there is nothing ambiguous, neither vague nor incomplete,
  • or rhetorical: it is not: the answer to my question is not something obvious,
  • and cannot be reasonably answered: several people already gave great answers to the question, showing that the question can be answered reasonably.

It also seems quite obvious how to rate the answers and determine the accepted answer. If the answer gives good reasons of what's wrong with XML, there are chances that this answer will be voted up, then accepted.

Was it helpful?

Solution

Some weaknesses:

  • It is somewhat difficult to associate xml files and external resources, which is why the new Office document formats use a zip envelope that includes a skeleton xml file and resource files bundled together. The other option of using base64 encoding is very verbose and doesn't allow good random access, which brings one to the next point:
  • Random access is difficult. Neither of the two traditional modes of reading an xml file - construct a DOM or forward-only SAX style reading allow for truly random access.
  • Concurrent write access to different parts of the file is difficult, which is why its use in Windows executable manifests is error prone.
  • What encoding does an xml file use? Strictly speaking you guess the encoding first, then read the file and verify the encoding was right.
  • It is difficult to version portions of a file. Therefore if you want to provide granular versioning, you need to split your data. This is not just a file format issue, but also due to the fact that tools generally provide per-file semantics - version control tools, sync tools like DropBox, etc.

OTHER TIPS

<xml>
    <noise>
        The
    </noise>
    <adjective>
        main
    </adjective>
    <noun>
        weakness
    </noun>
    <noise>
        of
    </noise>
    <subject>
        XML
    </subject>
    <noise>
        ,
    </noise>
    <whocares>
        in my opinion
    </whocares>
    <noise>
        ,
    </noise>
    <wildgeneralisation>
        is its verbosity
    </wildgeneralisation>
    <noise>
        .
    </noise>
</xml>

I'm not the right person to ask, as I am a big fan of xml myself. However, I can tell you one of the main complaints that I have heard:

It is hard to work with. Here, hard means that it takes knowing an API and that you will need to write relatively much code to parse your xml. While I wouldn't say that it's really all that hard, I can only agree that a language that is made to describe objects, can be accessed more easily when using a language that supports dynamically created objects.

I think in general the reaction is simply because XML is overused.

However, if there is one word I hate about XML, with a passion, is namespaces. The lost productivity around namespace problems is horrific.

XML descends from SGML, the great-granddaddy of markup languages. The purpose of SGML and by extension XML is to annotate text. XML does this well and has a wide range of tools that increase its facility for a variety of applications.

The problem, as I see it, is that XML is frequently used, not to annotate text, but to represent structured data, which is a subtle but important difference. In practical terms, structured data needs to be concise for a variety of reasons. Performance is an obvious one, especially when bandwidth is limited. This is probably one of the main reasons why JSON is so popular for web applications. Concise data structure representation on the wire means better scalability.

Unfortunately, JSON is not very readable without extra whitespace padding, which is almost always omitted. On the other hand, if you have ever tried editing a large XML file using a command-line editor, it can be very awkward as well.

Personally, I find that YAML strikes a nice balance between the two extremes. Compare the following (copied from yaml.org with minor changes).

YAML:

invoice: 34843
  date: 2001-01-23
  billto: &id001
    given: Chris
    family: Dumars
    address:
      lines: |
        458 Walkman Dr.
        Suite #292
      city: Royal Oak
      state: MI
      postal: 48046
  shipto: *id001
  product:
  - sku: BL394D
    quantity: 4
    description: Basketball
    price: 450.00
  - sku: BL4438H
    quantity: 1
    description: Super Hoop
    price: 2392.00
  tax : 251.42
  total: 4443.52
  comments: >
    Late afternoon is best.
    Backup contact is Nancy
    Billsmer @ 338-4338.

XML:

<invoice>
   <number>34843</number>
   <date>2001-01-03</date>
   <billto id="id001">
      <given>Chris</given>
      <family>Dumars</family>
      <address>
        <lines>
          458 Walkman Dr.
          Suite #292
        </lines>
        <city>Royal Oak</city>
        <state>MI</state>
        <postal>48046</postal>
      </address>
   </billto>
   <shipto xref="id001" />
   <products>
      <product>
        <sku>BL394D</sku>
        <quantity>4</quantity>
        <description>Basketball</description>
        <price>450.00</price>
      </product>
      <product>
        <sku>BL4438</sku>
        <quantity>1</quantity>
        <description>Super Hoop</description>
        <price>2392.00</price>
      </product>
   </products>
   <tax>251.42</tax>
   <total>4443.52</total>
   <comments>
    Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338
   </comments>
</invoice>

They both represent the same data, but the YAML is over 30% smaller and arguably more readable. Which would you prefer to have to modify with a text editor? There are many libraries available to parse and emit YAML (i.e. snakeyaml for Java developers).

As with everything, the right tool for the right job is the best rule to follow.

My favorite nasty problem is with XML serialization formats that use attributes - like XAML.

This works:

<ListBox ItemsSource="{Binding Items}" SelectedItem="{Binding CurrentSelection}"/>

This doesn't:

<ListBox SelectedItem="{Binding CurrentSelection}" ItemsSource="{Binding Items}"/>

XAML deserialization assigns property values as they're read from the XML stream. So in the second example, when the SelectedItem property is assigned, the control's ItemsSource hasn't been set yet, and the SelectedItem property is being assigned to an item that yet know exists.

If you're using Visual Studio to create your XAML files, everything will be cool, because Visual Studio maintains the ordering of attributes. But modify your XAML in some XML tool that believes the XML recommendation when it says that the ordering of attributes is not significant, and boy are you in a world of hurt.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top