Getting Page Content via Json

https://stackoverflow.com/questions/20753363

21-09-2022
|

Question

Link:http://creepypasta.wikia.com/api.php?%20action=query&prop=revisions&titles=Main_Page&rvprop=content&indexpageids=1&format=jsonfm

From the json file above I want to get the value of "*". I am using python and have the request setup. Normally if I didn't need to grab the page id before I could get the page content I could do this. But seeing as it is not I have run into a bit of trouble and need a bit of help.

Solution

That page isn't actually json - it is a representation of the json in html. To request the json, remove the 'fm' at the end of the url.

In this code, I will load the json into a dictionary using the urllib2 and json packages, and then access the * item.

url = "http://creepypasta.wikia.com/api.php?%20action=query&prop=revisions&titles=Main_Page&rvprop=content&indexpageids=1&format=json"
j = json.load(urllib2.urlopen(url))
value = j['query']['pages']['22491']['revisions'][0]['*']

If you do not know what page number to look at, consider the method found here (replicated below):

def _finditem(obj, key):
    if key in obj: return obj[key]
    for k, v in obj.items():
        if isinstance(v,dict):
            item = _finditem(v, key)
            if item is not None:
                return item

_finditem(j,'revisions')[0]['*']

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow