How to identify multiple words and corresponding values from each line in a file ex: "status":"ok"

StackOverflow https://stackoverflow.com/questions/21901049

Pregunta

I'm trying to create a script that essentially will allow me to create a list with specific items from the lines that can be inserted into an SQL DB. I have multiple lines like the following in a text file "addresses.txt":

{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f508f-e7c8-32b8-e044-0003ba298018","municipalityCode":"0766","municipalityName":"Hedensted","streetCode":"0072","streetName":"Værnegården","streetBuildingIdentifier":"13","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"8000","districtName":"Århus","presentationString":"Værnegården 13, 8000 Århus","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(553564 6179299)","x":553564,"y":6179299}]}

For example I want to remove

"type":"addressAccessType","addressAccessId":"0a3f508f-e7c8-32b8-e044-0003ba298018"

And in the end up with a column list and a value list that can be written to a file_output.txt like:

INSERT INTO ADDRESSES (%s) VALUES (%s)

This is what I have so far

# Writes %s into the file output_data.txt
address_line = """INSERT INTO ADDRESSES (%s) VALUES (%s)"""

# Reads every line from the file messy_data.txt
messy_string = file("addresses.txt").readlines()

cols = messy_string[0].split(",")  #Defines each word in the first line separated by , as a column name
colstr = ','.join(cols) # formatted string that will plug in nicely
output_data = file("output_data.txt", 'w') # Creates the output file: output_data.txt
for r in messy_string[0:]: # loop through everything after first line
    #r = r.replace(':',',')
    #temp_replace = r.translate(None,'"{}[]()')
    #address_list = temp_replace.split(",")
    #address_list = [x.encode('utf-8') for x in address_list]
    vals = r.split(",") # split at ,
    valstr = ','.join(vals) # join with commas for sql
    output_data.write(address_line % (colstr, valstr))  # write to file

output_data.close()

If included some of my out commented attempts, maybe it can help. Also I noticed that when ever I use #address_list = temp_replace.split(","), all of my utf-8 characters is screwed uo, and I do not know why or how to correct this.

UPDATE Looking at this example How can I convert JSON to CSV? I have come up with this code to fix my problem:

# Reads every line from the file coordinates.txt
messy_string = file("coordinates.txt").readlines()

# Reads with the json module
x = json.loads(messy_string

x = json.loads(x)
f = csv.writer(open('test.csv', 'wb+'))

for x in x:
f.writerow([x['status'], 
            x['message'], 
            x['data']['type'], 
            x['data']['addressAccessId'],
            x['data']['municipalityCode'],
            x['data']['municipalityName'],
            x['data']['streetCode'],
            x['data']['streetName'],
            x['data']['streetBuildingIdentifier'],
            x['data']['mailDeliverySublocationIdentifier'],
            x['data']['districtSubDivisionIdentifier'],
            x['data']['postCodeIdentifier'],
            x['data']['districtName'],
            x['data']['presentationString'],
            x['data']['addressSpecificCount'],
            x['data']['validCoordinates'],
            x['data']['geometryWkt'],
            x['data']['x'],
            x['data']['y']])

However, this does not fix my problem, now I get the following error

Traceback (most recent call last):
  File "test2.py", line 10, in <module>
    x = json.loads(messy_string)
  File "C:\Python27\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer

Can anyone help? Thanks in advance.

¿Fue útil?

Solución

Each line looks like valid JSON to me. You can simply evaluate the JSON and select the keys you'd like to keep (like you would with a dictionary)

import json

messy_string = file("addresses.txt").readlines()

for line in messy_string:
  try:
    parsed = json.loads(line)
    column_names = parsed.keys()
    column_values = parsed.values()
    print parsed
  except:
    raise 'Could not parse line'
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top