Question

I'm trying to skip over RSS feeds that have not been modified using feedparser and etags. Following the guidelines of the documentation: http://pythonhosted.org/feedparser/http-etag.html

import feedparser

d = feedparser.parse('http://www.wired.com/wiredscience/feed/')
d2 = feedparser.parse('http://www.wired.com/wiredscience/feed/', etag=d.etag)

print d2.status

This outputs:

200

Shouldn't this script return a 304? My understanding is that when the RSS feed gets updated the etag changes and if they match then I should get a 304.

How come I am not getting my expected result?

Was it helpful?

Solution

Apparently this server is configured to check 'If-Modified-Since' header. You need to pass last modified time as well:

>>> d = feedparser.parse('http://www.wired.com/wiredscience/feed/')
>>> feedparser.parse('http://www.wired.com/wiredscience/feed/', 
                     etag=d.etag, modified=d.modified).status
304
>>> feedparser.parse('http://www.wired.com/wiredscience/feed/', 
                     etag=d.etag).status
200
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top