سؤال

I've created a very simple piece of code to read in tweets in JSON format in text files, determine if they contain an id and coordinates and if so, write these attributes to a csv file. This is the code:

f = csv.writer(open('GeotaggedTweets/ListOfTweets.csv', 'wb+'))
all_files = glob.glob('SampleTweets/*.txt')
for filename in all_files:
    with open(filename, 'r') as file:
        data = simplejson.load(file)
        if 'text' and 'coordinates' in data:
            f.writerow([data['id'], data['geo']['coordinates']])

I've been having some difficulties but with the help of the excellent JSON Lint website have realised my mistake. I have multiple JSON objects and from what I read these need to be separated by commas and have square brackets added to the start and end of the file.

How can I achieve this? I've seen some examples online where each individual line is read and it's added to the first and last line, but as I load the whole file I'm not entirely sure how to do this.

هل كانت مفيدة؟

المحلول

You have a file that either contains too many newlines (in the JSON values themselves) or too few (no newlines between the tweets at all).

You can still repair this by using some creative re-stitching. The following generator function should do it:

import json

def read_objects(filename):
    decoder = json.JSONDecoder()

    with open(filename, 'r') as inputfile:
        line = next(inputfile).strip()
        while line:
            try:
                obj, index = decoder.raw_decode(line)
                yield obj
                line = line[index:]
            except ValueError:
                # Assume we didn't have a complete object yet
                line += next(inputfile).strip()
            if not line:
                line += next(inputfile).strip()

This should be able to read all your JSON objects in sequence:

for filename in all_files:
    for data in read_objects(filename):
        if 'text' and 'coordinates' in data:
            f.writerow([data['id'], data['geo']['coordinates']])

It is otherwise fine to have multiple JSON strings written to one file, but you need to make sure that the entries are clearly separated somehow. Writing JSON entries that do not use newlines, then using newlines in between them, for example, makes sure you can later on read them one by one again and process them sequentially without this much hassle.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top