How do I efficiently pretty-print a list of JSON objects? [duplicate]

Question 1

GNU Parallel will by default put the input as arguments on the command line. So what you do is:

python -mjson.tool \[\"cheese\",\ \{\"cake\":\[\"coke\",\ null,\ 160,\ 2\]\}\]

But what you want is:

echo \[\"cheese\",\ \{\"cake\":\[\"coke\",\ null,\ 160,\ 2\]\}\] | python -mjson.tool

GNU Parallel can do that with --pipe -N1:

parallel -N1 --pipe python -mjson.tool < jsonList

10 seconds installation:

wget -O - pi.dk/3 | bash

Watch the intro video for a quick introduction: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1 or at

Walk through the tutorial (man parallel_tutorial). You command line with love you for it.

Question 2

Well the json module already has something similar to what you have in mind.

>>> import json
>>>
>>> my_json = '["cheese", {"cake":["coke", null, 160, 2]}]'
>>> parsed = json.loads(my_json)
>>> print json.dumps(parsed, indent=4, sort_keys=True)
[
    "cheese", 
    {
        "cake": [
            "coke", 
            null, 
            160, 
            2
        ]
    }
]

And you can just input my_json from a text file using open in r mode.

Question 3

Two problems with my approach, which I eventually solved:

the default parallelization will spawn a new python VM for each thread, which is... slow. So slow.

The default json.tool does the naive implementation, but somehow is confusing the number of incoming arguments.

I wrote this:

import sys
import json
for i in sys.argv[1:]:
    o = json.loads(i)
    json.dump(o, sys.stdout, indent=4, separators=(',',': '))

Then called it like this:

parallel -n 500 python fastProcess.py < filein > prettyfileout

I'm not quite sure of the optimal value of n, but the script is 4-5x faster in wall clock time than the naive implementation due to the ability to use multiple cores.