Question

My application need to store large amounts of XML-like hierarchical information with the following requirements:

  1. Fast to read
  2. Minimal memory consumption
  3. Typed data instead of merely text

Any suggestions for a binary format that fulfills these goals?

Was it helpful?

Solution

you don't specify if xml is a format requirement you only say it needs to be hierarchical like xml.

Without more detail on the kind of data it's hard to give you very much advice. So here's a small list.

  • b-trees there are a number of libraries supporting b-tree storage formats in mulitiple languages. they have fast lookups and are hierarchical in nature.
  • Protocol-Buffers from google. Compact storage optimized for sending over the wire. Not neccessarily optimized as a storage format though. They are typed though and probably will do pretty well as a storage format.
  • Zipped text formats. compact, and depending on the format chosen typed and hierarchical in nature.
    • YAML (supporting for some complex typing, hierarchical, human readable)
    • JSON (less typing support, fast parsing, hierarchical, human readable)

OTHER TIPS

Do other applications need to read the stored data, or just yours? Does it need to be a "standard" format?

Fast Infoset meets requirements (1) and (2), although because it's just a binary representation of the XML information model, it's just as untyped as XML. Might be good enough for your purposes, though, in the absence of anything else.

There's too little detail in your requirements to give good suggestions. For example are you free to pick your storage medium? Will it be a file system, database or something else?

What does "minimum memory consumption" mean? Are you running on a constrained platform? Must you share resources with other applications? Is a 1GB footprint small enough if your computer has 4GB of memory? Will your data sit in memory or only the parts you are working on?

If the platform was Java, I'd start with its standard serialization and then investigate custom serialization if I wasn't happy with the performance.

You could also read the XML into an object graph and store as Google Protocol Buffers. These are designed to be very efficient.

If the format is discussable, I'd suggest JSON, not XML. JSON is actually faster to load and write than standard XML.

More about JSON :

http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=060ca7c3-b03f-41aa-937b-c8cba5b7f986 http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=39842a17-781a-45c8-ade5-58286909226b

Wikipedia's explanation of the issue: http://en.wikipedia.org/wiki/Binary_XML

Supposedly the recommended organisation and its java and .net sdk can be downloaded from: http://www.agiledelta.com/product_efx.html

xml is pure text but can be used to represent serialized objects. Let's presume your serializer is serializing your objects into xml.

You should not try to convert your objects into binary streams because you would have to tackle endian (http://en.wikipedia.org/wiki/Endian) and data-representation issues. However, if you insist, you would need to use XDR (http://en.wikipedia.org/wiki/External_Data_Representation) for its data architecture neutrality.

Otherwise, you should serialize your objects to XML using standard serializers and then convert the xml to binary/compact xml because of the availability of libraries and sdks. And then deserialize by decompacting from binary xml.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top