質問

We are trying to retrieve ALL the posts, with associated comments and images, made to our group in the last year. I've tried using GraphAPI to do this but pagination means I have to get data, then copy the "next" link, and run again. Unfortunately, this means a LOT of work, since there are over 2 million posts to the group.

Does ANYONE know of a way to do this without spending a few days clicking? Also consider that the group has 4000+ members and is growing everyday, with, on average, about 1000 posts a DAY at the moment.

For the curious, the PLAN is to cull the herd... I am HOPELESS at programming and have recently started learning Python...

役に立ちましたか?

解決

I made it like this, you'll probably have to iterate through all posts until data is empty. Note this is Python 2.x version.

from facepy import GraphAPI
import json

group_id = "YOUR_GROUP_ID"
access_token = "YOUR_ACCESS_TOKEN"

graph = GraphAPI(access_token)

# https://facepy.readthedocs.org/en/latest/usage/graph-api.html
data = graph.get(group_id + "/feed", page=False, retry=3, limit=800)

with open('content.json', 'w') as outfile:
  json.dump(data, outfile, indent = 4)

他のヒント

I've just found, and used @dfdfdf 's solution, which is great! You can generalize it to download from multiple pages of a feed, rather than just the first one, like so:

from facepy import GraphAPI
import json

group_id = "\YOUR_GROUP_ID"
access_token = "YOUR_ACCESS_TOKEN"

graph = GraphAPI(access_token)
pages = graph.get(group_id + "/feed", page=True, retry=3, limit=1000)
i = 0
for p in pages:
    print 'Downloading page', i
    with open('content%i.json' % i, 'w') as outfile:
        json.dump(p, outfile, indent = 4)
    i += 1
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top