Question

For me usable means that:

  • it's being used in real-wold
  • it has tools support. (at least some simple editor)
  • it has human readable syntax (no angle brackets please)

Also I want it to be as close to XML as possible, i.e. there must be support for attributes as well as for properties. So, no YAML please. Currently, only one matching language comes to my mind - JSON. Do you know any other alternatives?

Was it helpful?

Solution

YAML is a 100% superset of JSON, so it doesn't make sense to reject YAML and then consider JSON instead. YAML does everything JSON does, but YAML gives so much more too (like references).

I can't think of anything XML can do that YAML can't, except to validate a document with a DTD, which in my experience has never been worth the overhead. But YAML is so much faster and easier to type and read than XML.

As for attributes or properties, if you think about it, they don't truly "add" anything... it's just a notational shortcut to write something as an attribute of the node instead of putting it in its own child node. But if you like that convenience, you can often emulate it with YAML's inline lists/hashes. Eg:

<!-- XML -->
<Director name="Spielberg">
    <Movies>
        <Movie title="Jaws" year="1975"/>
        <Movie title="E.T." year="1982"/>
    </Movies>
</Director>


# YAML
Director: 
    name: Spielberg
    Movies:
      - Movie: {title: E.T., year: 1975}
      - Movie: {title: Jaws, year: 1982}

For me, the luxury of not having to write each node tag twice, combined with the freedom from all the angle-bracket litter makes YAML a preferred choice. I also actually like the lack of formal tag attributes, as that always seemed to me like a gray area of XML that needlessly introduced two sets of syntax (both when writing and traversing) for essentially the same concept. YAML does away with that confusion altogether.

OTHER TIPS

JSON is a very good alternative, and there are tools for it in multiple languages. And it's really easy to use in web clients, as it is native javascript.

I have found S-Expressions to be a great way to represent structured data. It's a very simple format which is easy to generate and parse. It doesn't support attributes, but like YAML & JSON, it doesn't need to. Attributes are simply a way for XML to limit verbosity. Simpler, cleaner formats just don't need them.

TL;DR

Prolog wasn't mentioned here, but it is the best format I know of for representing data. Prolog programs, essentially, describe databases, with complex relationships between entities. Prolog is dead-simple to parse, whose probably only rival is S-expressions in this domain.

Full version

Programmers often "forget" what XML actually consists of. Usually referring to a very small subset of what it is. XML is a very complex format, with at least these parts: DTD schema language, XSD schema language, XSLT transformation language, RNG schema language and XPath (plus XQuery) languages - they all are part and parcel of XML standard. Plus, there are some apocrypha like E4X. Each and every one of them having their own versions, quite a bit of overlap, incompatibilities etc. Very few XML parsers in the wild implement all of them. Not to mention the multiple quirks and bugs of the popular parses, some leading to notable security issues like https://en.wikipedia.org/wiki/XML_external_entity_attack .

Therefore, looking for an XML alternative is not a very good idea. You probably don't want to deal with the likes of XML at all.

YAML is, probably, the second worst option. It's not as big as XML, but it was also designed in an attempt to cover all bases... more than ten times each... in different and unique ways nobody could ever conceive of. I'm yet to hear about a properly working YAML parser. Ruby, the language that uses YAML a lot, had famously screwed up because of it. All YAML parsers I've seen to date are copies of libyaml, which is itself a hand-written (not a generated from a formal description) kind of parser, with a code which is very difficult to verify for correctness (functions that span hundreds of lines with convoluted control flow). As was already mentioned, it completely contains JSON in it... on top of a handful of Unicode coding techniques... inside the same document, and probably a bunch of other stuff you don't want to hear about.

JSON, on the other hand, is completely unlike the other two. You can probably write a JSON parser while waiting for downloading JSON parser artefact from your Maven Nexus. It can do very little, but at least you know what it's capable of. No surprises. (Except some discrepancies related to character escaping in strings and doubles encoding). No covert exploits. You cannot write comments in it. Multiline strings look bad. Whatever you mean by distinction between properties and attributes you can implement by more nested dictionaries.

Suppose, though you wanted to right what XML wronged... well, then the popular stuff like YAML or JSON won't do it. Somehow fashion and rational thinking parted ways in programming some time in the mid seventies. So, you'll have to go back to where it all began with McCarthy, Hoare, Codd and Kowalski, figure out what is it you are trying to represent, and then see what's the best representation technique there is for whatever is that you are trying to represent :)

Jeff wrote about this here and here. That should help you get started.

I would recommend JSON ... but since you already mentioned it maybe you should take a look at Google protocol buffers.

Edit: Protocol buffers are made to be used programatically (there are bindings for c++, java, python ...) so they may not be suited for your purpose.

You're demands are a bit impossible.. You want something close to XML, but reject probably the closest equivalent that doesn't have angle-bracket (YAML).

As much as I dislike it, why not just use XML? You shouldn't ever have to actually read XML (aside from debugging, I suppose), there are an absurd amount of tools about for it.

Pretty much anything that isn't XML isn't going to be as widely used, thus there will be less tool support.

JSON is probably about equivalent, but it's pretty much equally unreadable.. but again, you shouldn't ever have to actually read it (load it into whatever language you are using, and it should be transformed into native arrays/dicts/variables/whatever).

Oh, I do find JSON far nicer to parse than XML: I've used it in Javascript, and the simplejson Python module - about one command and it's nicely transformed into a native Python dict, or a Javascript object (thus the name!)

There is AXON that cover the best of XML and JSON. Let's explain that in several examples.

AXON could be considered as shorter form of XML data.

XML

<person>
   <name>Frank Martin</name>
   <age>32</age>
 </person>

AXON

person{
  name{"Frank Martin"}
  age{32}}

or

person
  name:
    "Frank Martin"
  age:
    32

XML

<person name="Frank Martin" age="32" />

AXON

person{name:"Frank Martin" age:32}

or

person
  name: "Frank Martin"
  age: 32

AXON contains some form of JSON.

JSON

{"name":"Frank Martin" "age":32 "birth":"1965-12-24"}

AXON

{name:"Frank Martin" age:32 birth:1965-12-24}

AXON can represent combination of XML-like and JSON-like data.

AXON

table {
  fields {
    ("id" "int") ("val1" "double") ("val2" "int") ("val3" "double")
  }
  rows {
    (1 3.2 123 -3.4)
    (2 3.5 303 2.4)
    (3 2.3 235 -1.2)
  }
}

or

table
  fields
    ("id" "int")
    ("val1" "double")
    ("val2" "int") 
    ("val3" "double")
  rows
    (1 3.2 123 -3.4)
    (2 3.5 303 2.4)
    (3 2.3 235 -1.2)

There is available the python library pyaxon now.

I think Clearsilver is a very good alternative. They even have a comparison page here and a list of projects that use it

For storing code-like data, LES (Loyc Expression Syntax) is a budding alternative. I've noticed a lot of people use XML for code-like constructs, such as build systems which support conditionals, command invocations, sometimes even loops. These sorts of things look natural in LES:

// LES code has no built-in meaning. This just shows what it looks like.
[DelayedWrite]
Output(
    if version > 4.0 {
        $ProjectDir/Src/Foo;
    } else {
        $ProjectDir/Foo;
    }
);

It doesn't have great tool support yet, though; currently the only LES library is for C#. Currently only one app is known to use LES: LLLPG. It supports "attributes" but they are like C# attributes or Java annotations, not XML attributes.

In theory you could use LES for data or markup, but there are no standards for how to do that:

body {
    '''Click here to use the World's '''
    a href="http://google.com" {
        strong "most popular"; " search engine!"
    };
};

point = (2, -3);
tasteMap = { "lemon" -> sour; "sugar" -> sweet; "grape" -> yummy };

If you're allergic to angle brackets, then JSON, HDF (ClearSilver), and OGDL are the only ones I know offhand.

After a bit of googling, I also found a list of alternatives here:
http://web.archive.org/web/20060325012720/www.pault.com/xmlalternatives.html

YAML is extremely fully-featured and generally human-readable format, but it's Achilles heal is complexity as demonstrated by the Rails vulnerabilities we saw this winter. Due to its ubiquity in Ruby as a config language Tom Preston-Werner of Github fame stepped up to create a sane alternative dubbed TOML. It gained massive traction immediately and has great tool support. I highly recommend anyone looking at YAML check it out:

https://github.com/mojombo/toml

AFAIK, JSON and YAML are exactly equivalent in data structure terms. YAML just has less brackets and quotes and stuff. So I don't see how you are rejecting one and keeping the other.

Also, I don't see how XML's angle brackets are less "human readable" than JSON's square brackets, curly brackets and quotes.

There are truly plenty of alternatives to XML, but the main problem with many of them seems to be that libraries might not be available for every language of choice and the libraries are relatively laborious to implement.

Parsing a tree structure itself might not be that pleasant, if compared to key-value pairs, e.g. hash tables. If a hash table instance meets the requirement that all of its keys are strings and all of its values are strings, then it's relatively non-laborous to implement hashtable2string() and string2hashtable().

I've been using the hash table serialization in AJAX between PHP and JavaScript and the format that I've developed, is called ProgFTE (Programmer Friendly text Exchange) and is described at:

http://martin.softf1.com/g/n//a2/doc/progfte/index.html

One can find a Ruby version of the ProgFTE implementation as part of the Kibuvits Ruby Library: http://rubyforge.org/projects/kibuvits/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top