Вопрос

I'm trying to load some geographic data with Python's simplejson.

<!-- language: lang-py -->
string = file("prCounties.txt","r").read().decode('utf-8')  
d = simplejson.loads(string)    

The text file has a tilde, the word should be Añasco instead it's u"A\xf1asco" which SimpleJson is not parsing. The source is a geoJson file from github

{"type": "FeatureCollection", "properties": {"kind": "state", "state": "PR"}, "features": [[{"geometry": {"type": "MultiPolygon", "coordinates": [[[[-67.122, 18.3239], [-67.0508, 18.3075], [-67.0398, 18.291], [-67.0837, 18.2527], [-67.122, 18.2417], [-67.1603, 18.2746], [-67.1877, 18.2691], [-67.2261, 18.2965], [-67.1822, 18.3129], [-67.1275, 18.3184]]]]}, "type": "Feature", "properties": {"kind": "county", "name": u"A\xf1asco", "state": "PR"}}]]}

Python gives me the error simplejson.decoder.JSONDecodeError: Expecting object


The script I used to load from GitHub to generate prCounties.txt. The variable counties is a list of strings related to the locations of the relevant GEOjson data.

It's clear this is not the proper way to save this data:

<!-- language: lang-py -->
countyGeo = [ ]

for x in counties:      
    d = simplejson.loads(urllib.urlopen("https://raw.github.com/johan/world.geo.json/master/countries/USA/PR/%s" % (x)).read())         
    countyGeo += [ d["features"][0]]
    d["features"][0]=countyGeo  
file("prCounties.txt", "w").write(str(d))

EDIT: In the last line, I replaced the str with simplejson.dumps. I guess it encodes properly now. file("prCounties.txt", "w").write(simplejson.dumps(d))

Это было полезно?

Решение

There are two problems here. First:

string = file("prCounties.txt","r").read().decode('utf-8')

Why are you decoding it? JSON explicitly takes UTF-8 strings. That's part of the definition of JSON. The fact that simplejson can handle Unicode strings makes it a little easier to use, but it effectively handles them by encoding them back to UTF-8, so… why not just leave it that way in the first place?

More importantly, where did your data come from? If prCounties.txt has that u"Añasco" in it, it's not JSON. You can't encode something to one standard and decode to a completely different standard just because they look similar.

If, for example, you did open('prCounties.txt', 'w').write(repr(my_dict)), you have to read it back with a Python repr parser (possibly ast.literal_eval, or maybe you have to write something yourself).

Or, alternatively, if you want to parse the data as JSON, write it as JSON in the first place.


According to your comment, the data was read from https://raw.github.com/johan/world.geo.json/master/countries/USA/PR/Añasco.geo.json

The raw contents of that URL are:

{"type":"FeatureCollection","properties":{"kind":"state","state":"PR"},"features":[
{"type":"Feature","properties":{"kind":"county","name":"Añasco","state":"PR"},"geometry":{"type":"MultiPolygon","coordinates":[[[[-67.1220,18.3239],[-67.0508,18.3075],[-67.0398,18.2910],[-67.0837,18.2527],[-67.1220,18.2417],[-67.1603,18.2746],[-67.1877,18.2691],[-67.2261,18.2965],[-67.1822,18.3129],[-67.1275,18.3184]]]]}}
]}

You'll notice that there is no "name": u"Añasco" (or "name": u"A\xf1asco", or anything similar) there. You can read this just by calling read—no need to decode it from UTF-8 or anything—and just pass it to simplejson.loads and it works just fine:

$ curl -O https://raw.github.com/johan/world.geo.json/master/countries/USA/PR/Añasco.geo.json
$ cp Añasco.geo.json prCounties.txt
$ python
>>> import simplejson
>>> string = file("prCounties.txt","r").read()
>>> d = simplejson.loads(string)
>>> print d
{u'type': u'FeatureCollection', u'properties': {u'kind': u'state', u'state': u'PR'}, u'features': [{u'geometry': {u'type': u'MultiPolygon', u'coordinates': [[[[-67.122, 18.3239], [-67.0508, 18.3075], [-67.0398, 18.291], [-67.0837, 18.2527], [-67.122, 18.2417], [-67.1603, 18.2746], [-67.1877, 18.2691], [-67.2261, 18.2965], [-67.1822, 18.3129], [-67.1275, 18.3184]]]]}, u'type': u'Feature', u'properties': {u'kind': u'county', u'name': u'A\xf1asco', u'state': u'PR'}}]}

See, no errors at all.

Somewhere, you've done something to this data to turn it into something else which is not JSON. My guess is that, on top of doing a bunch of unnecessary extra decode and encode calls, you've also done a simplejson.loads, then tried to re-simplejson.loads the repr of the dict you got back. Or maybe you've JSON-encoded a dict full of already-encoded JSON strings. Whatever you've done, that code, not the code you're showing us, is where the error is.

And the easiest fix is probably to generate prCounties.txt properly in the first place. It's just 70-odd downloads of a few lines apiece, and it should take maybe 2 lines of bash or 4 lines of Python to do it…

Другие советы

Your input file is not valid JSON. There is a u before the "A\xf1asco" string, which is Python syntax, not JSON syntax. It should be:

"name":"A\xf1asco",

This works:

>>> import json
>>> json.loads(u'{"name":"A\xf1asco"}')
{u'name': u'A\xf1asco'}

You have to remove the "u" in your prCounties.txt file (as already told). Then you can use this code, which works well to create the variable "string" in a format readable by the simplejson.loads() function:

import simplejson
string = file("prCounties.txt", "r").read().decode("string-escape")
string = unicode(string, "latin-1")
simplejson.loads(string)
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top