سؤال

I'm using the Alchemy API in app engine so I'm using the simplejson library to parse responses. The problem is that the responses have entries that have the sme name

 {
    "status": "OK",
    "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
    "url": "",
    "language": "english",
    "entities": [
        {
            "type": "Person",
            "relevance": "0.33",
            "count": "1",
            "text": "Michael Jordan",
            "disambiguated": {
                "name": "Michael Jordan",
                "subType": "Athlete",
                "subType": "AwardWinner",
                "subType": "BasketballPlayer",
                "subType": "HallOfFameInductee",
                "subType": "OlympicAthlete",
                "subType": "SportsLeagueAwardWinner",
                "subType": "FilmActor",
                "subType": "TVActor",
                "dbpedia": "http://dbpedia.org/resource/Michael_Jordan",
                "freebase": "http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000029161",
                "umbel": "http://umbel.org/umbel/ne/wikipedia/Michael_Jordan",
                "opencyc": "http://sw.opencyc.org/concept/Mx4rvViVq5wpEbGdrcN5Y29ycA",
                "yago": "http://mpii.de/yago/resource/Michael_Jordan"
            }
        }
    ]
}

So the problem is that the "subType" is repeated so the dict that a loads returns is just "TVActor" rather than a list. Is there anyway to go around this?

هل كانت مفيدة؟

المحلول

The rfc 4627 that defines application/json says:

An object is an unordered collection of zero or more name/value pairs

And:

The names within an object SHOULD be unique.

It means that AlchemyAPI should not return multiple "subType" names inside the same object and claim that it is a JSON.

You could try to request the same in XML format (outputMode=xml) to avoid ambiguity in the results or to convert duplicate keys values into lists:

import simplejson as json
from collections import defaultdict

def multidict(ordered_pairs):
    """Convert duplicate keys values to lists."""
    # read all values into lists
    d = defaultdict(list)
    for k, v in ordered_pairs:
        d[k].append(v)

    # unpack lists that have only 1 item
    for k, v in d.items():
        if len(v) == 1:
            d[k] = v[0]
    return dict(d)

print json.JSONDecoder(object_pairs_hook=multidict).decode(text)

Example

text = """{
  "type": "Person",
  "subType": "Athlete",
  "subType": "AwardWinner"
}"""

Output

{u'subType': [u'Athlete', u'AwardWinner'], u'type': u'Person'}

نصائح أخرى

The rfc 4627 for application/json media type recommends unique keys but it doesn't forbid them explicitly:

The names within an object SHOULD be unique.

From rfc 2119:

SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

This is a known problam.

You can solve this by modify the duplicate key, or save him into array. You can use this code if you want.

import json

def parse_object_pairs(pairs):
    """
    This function get list of tuple's
    and check if have duplicate keys.
    if have then return the pairs list itself.
    but if haven't return dict that contain pairs.

    >>> parse_object_pairs([("color": "red"), ("size": 3)])
    {"color": "red", "size": 3}

    >>> parse_object_pairs([("color": "red"), ("size": 3), ("color": "blue")])
    [("color": "red"), ("size": 3), ("color": "blue")]

    :param pairs: list of tuples.
    :return dict or list that contain pairs.
    """
    dict_without_duplicate = dict()
    for k, v in pairs:
        if k in dict_without_duplicate:
            return pairs
        else:
            dict_without_duplicate[k] = v

    return dict_without_duplicate

decoder = json.JSONDecoder(object_pairs_hook=parse_object_pairs)

str_json_can_be_with_duplicate_keys = '{"color": "red", "size": 3, "color": "red"}'

data_after_decode = decoder.decode(str_json_can_be_with_duplicate_keys)
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top