Question

We are trying to retrieve ALL the posts, with associated comments and images, made to our group in the last year. I've tried using GraphAPI to do this but pagination means I have to get data, then copy the "next" link, and run again. Unfortunately, this means a LOT of work, since there are over 2 million posts to the group.

Does ANYONE know of a way to do this without spending a few days clicking? Also consider that the group has 4000+ members and is growing everyday, with, on average, about 1000 posts a DAY at the moment.

For the curious, the PLAN is to cull the herd... I am HOPELESS at programming and have recently started learning Python...

Was it helpful?

Solution

I made it like this, you'll probably have to iterate through all posts until data is empty. Note this is Python 2.x version.

from facepy import GraphAPI
import json

group_id = "YOUR_GROUP_ID"
access_token = "YOUR_ACCESS_TOKEN"

graph = GraphAPI(access_token)

# https://facepy.readthedocs.org/en/latest/usage/graph-api.html
data = graph.get(group_id + "/feed", page=False, retry=3, limit=800)

with open('content.json', 'w') as outfile:
  json.dump(data, outfile, indent = 4)

OTHER TIPS

I've just found, and used @dfdfdf 's solution, which is great! You can generalize it to download from multiple pages of a feed, rather than just the first one, like so:

from facepy import GraphAPI
import json

group_id = "\YOUR_GROUP_ID"
access_token = "YOUR_ACCESS_TOKEN"

graph = GraphAPI(access_token)
pages = graph.get(group_id + "/feed", page=True, retry=3, limit=1000)
i = 0
for p in pages:
    print 'Downloading page', i
    with open('content%i.json' % i, 'w') as outfile:
        json.dump(p, outfile, indent = 4)
    i += 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top