Question

I'm trying to collecting data from a frequently updating blog, so I simply use a while loop which includes urllib2.urlopen("http:\example.com") to refresh the page every 5 minutes to collect the data I wanted.

But I notice that I'm not getting the most recent content by doing this, it's different from what I see via browser such as Firefox, and after checking both the source code of Firefox and the same page I get from python, I found that it's WP Super Cache which is preventing me from getting the most recent result.

And I still get the same cache page even if I spoof the headers in my python code. So I wonder is there a way to by pass WP super cache? And why there's no such super cache in Firefox at all?

Was it helpful?

Solution

Have you tried changing the URL with some harmless data? Something like this:

import time
urllib2.urlopen("http:\example.com?time=%s" % int(time.time()))

It will actually call http:\example.com?time=1283872559. Most caching systems will bypass the cache if there's a querystring or it's something that isn't expected.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top