문제

I'm trying to collecting data from a frequently updating blog, so I simply use a while loop which includes urllib2.urlopen("http:\example.com") to refresh the page every 5 minutes to collect the data I wanted.

But I notice that I'm not getting the most recent content by doing this, it's different from what I see via browser such as Firefox, and after checking both the source code of Firefox and the same page I get from python, I found that it's WP Super Cache which is preventing me from getting the most recent result.

And I still get the same cache page even if I spoof the headers in my python code. So I wonder is there a way to by pass WP super cache? And why there's no such super cache in Firefox at all?

도움이 되었습니까?

해결책

Have you tried changing the URL with some harmless data? Something like this:

import time
urllib2.urlopen("http:\example.com?time=%s" % int(time.time()))

It will actually call http:\example.com?time=1283872559. Most caching systems will bypass the cache if there's a querystring or it's something that isn't expected.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top