سؤال

I was in the middle of writing a Python parser script for RSS feeds. I'm using feedparser, however, I'm stuck on parsing feeds from FeedBurner. Who needs FeedBurner nowadays? Anyways..

For example, I couldn't find ways to parse

http://feeds.wired.com/wired/index

http://feeds2.feedburner.com/ziffdavis/pcmag

When I put those into the feedparser library, don't seem to work. Tried putting ?fmt=xml or ?format=xml at the end of the urls, but still didn't get in xml format.

Do I need to use html parser such as BeautifulSoup to parse FeedBurner feeds? Preferably, is there a python public parser or aggregator script that handles this already?

Any tip or help will be greatly appreciated.

هل كانت مفيدة؟

المحلول

It's possible you have version issue or you're using the API incorrectly -- it would help to see your error message. For example, the following works with Python 2.7 and feedparser 5.0.1:

>>> import feedparser
>>> url = 'http://feeds2.feedburner.com/ziffdavis/pcmag'
>>> d = feedparser.parse(url)
>>> d.feed.title
u'PCMag.com: New Product Reviews'
>>> d.feed.link
u'http://www.pcmag.com'
>>> d.feed.subtitle
u"First Look At New Products From PCMag.com including Lab Tests, Ratings, Editor's and User's Reviews."
>>> len(d['entries'])
30
>>> d['entries'][0]['title']
u'Canon Color imageClass MF9280cdn'

And with the other URL:

>>> url = 'http://feeds.wired.com/wired/index'
>>> d = feedparser.parse(url)
>>> d.feed.title
u'Wired Top Stories'
>>> d.feed.link
u'http://www.wired.com/rss/index.xml'
>>> d.feed.subtitle
u'Top Stories<img src="http://www.wired.com/rss_views/index.gif" />'
>>> len(d['entries'])
30
>>> d['entries'][0]['title']
u'Heart of Dorkness: LARPing Goes Haywire in <em>Wild Hunt</em>'

نصائح أخرى

I know that this question is very old, but I figure it would be helpful to anyone who happens upon it by searching for a solution to parsing feedburner RSS feeds to paste a simple code I have for getting the latest entry from the Cracked.com feedburner. I have tested it on a few other sites and it works fine.

def GetRSS('RSSurl'):
    url_info = urllib.urlopen(RSSurl)
    if (url_info):
        xmldoc = minidom.parse(url_info)
    if (xmldoc):
        url = xmldoc.getElementsByTagName('link').firstChild.data
        title = xmldoc.getElementsByTagName('title').firstChild.data
        print url, print title

Just replace RSSurl with whatever the address of the feedburner page is. Also, as you can probably see, if there are any other elements you want, you can add just add an extra getElementsByTagName line there, with whatever you would like to get.

Edit: also, to my knowledge, will work with pretty much any RSS feed.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top