Adding brackets and commas to multiple JSON objects

Question

You have a file that either contains too many newlines (in the JSON values themselves) or too few (no newlines between the tweets at all).

You can still repair this by using some creative re-stitching. The following generator function should do it:

import json

def read_objects(filename):
    decoder = json.JSONDecoder()

    with open(filename, 'r') as inputfile:
        line = next(inputfile).strip()
        while line:
            try:
                obj, index = decoder.raw_decode(line)
                yield obj
                line = line[index:]
            except ValueError:
                # Assume we didn't have a complete object yet
                line += next(inputfile).strip()
            if not line:
                line += next(inputfile).strip()

This should be able to read all your JSON objects in sequence:

for filename in all_files:
    for data in read_objects(filename):
        if 'text' and 'coordinates' in data:
            f.writerow([data['id'], data['geo']['coordinates']])

It is otherwise fine to have multiple JSON strings written to one file, but you need to make sure that the entries are clearly separated somehow. Writing JSON entries that do not use newlines, then using newlines in between them, for example, makes sure you can later on read them one by one again and process them sequentially without this much hassle.