Question

I have a huge amount of files that are structured like this:

// NAME = XXXX
// MARKER = YYYY
// SOURCE = ZZZZ
# Real data follows, one item per line
Item1
Item2
Item3

I'm trying to move out of this legacy format because it's a bother to parse (sometimes there are 3 // lines, sometimes 4): given that the Item1, etc... lists are not long (max 400 items or so) I was thinking of a suitable replacement. The absolute necessity would be to be text-based.

I had in mind two ideas:

  1. Use JSON
  2. Keep the data as is but condense the header on one single line

The goals are however to avoid custom parsing as much as possible (the main issue for these files) and rely on built parsers (Python ones in my specific case). Assuming I go for 1, is JSON actually an appropriate choice?

Was it helpful?

Solution

Here are some reasons in favor of JSON:

  • It is human readable
  • It is not as verbose as XML
  • Python has a built-in library for reading/writing JSON
  • JSON is not Python-specific (unlike configparser). Other languages can also parse JSON

One possible problem with JSON:

  • The whole JSON must be read to be parsed. This may be a (memory) problem if the JSON is huge.

OTHER TIPS

In an enterprise environment, my preference would be XML due to its mature validation options, data type specifications and non-unicode support.

I understand JSON also has a draft Schema spec which would be worth using if you go down the JSON route.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top