Why feedparser for Python does not see all elements in the feed

https://stackoverflow.com/questions/23260644

08-07-2023
|

Question

I use the following code:

import feedparser as fp

if __name__ == '__main__':
    url = 'http://www.careerbuilder.de/RTQ/rss20.aspx?rssid=RSS_PD&num=25&geoip=false&ddcompany=false&ddtitle=false&cat=JN038'  
    d = fp.parse(url)
    for entry in d.entries:
        print entry
        print '----------------------'

As a result I get:

{'guidislink': 0, 'published': u'Wed, 23 Apr 2014 04:00:00 Z', 'published_parsed': time.struct_time(tm_year=2014, tm_mon=4, tm_mday=23, tm_hour=4, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=113, tm_isdst=0), 'title': u'Bankkaufmann (m/w)'}
----------------------
{'guidislink': 0, 'published': u'Wed, 23 Apr 2014 04:00:00 Z', 'published_parsed': time.struct_time(tm_year=2014, tm_mon=4, tm_mday=23, tm_hour=4, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=113, tm_isdst=0), 'title': u'Anlagenbuchhalter (m/w)'}
----------------------
{'guidislink': 0, 'published': u'Wed, 23 Apr 2014 04:00:00 Z', 'published_parsed': time.struct_time(tm_year=2014, tm_mon=4, tm_mday=23, tm_hour=4, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=113, tm_isdst=0), 'title': u'Bankkaufleute (m/w)'}
----------------------

It looks like entries in the feed do not have "summary" and "link" elements. As a conformation of that, I get an error message if I try to use entry.summary or entry.description. This is strange to me since I do see link and description elements in the xml for the feed, if I open it in my browser.

Does anybody know what I am doing wrong?

Solution

From the revision history of feedparser:

Universal Feed Parser 3.0b18 was released on February 17, 2004.

always map description to summary_detail (Andrei)

use libxml2 (if available)

And from here

Some RSS feeds use guid when they mean link. guid can also be used as an opaque identifier that has nothing to do with links. If an RSS feed uses guid as the entry link and no link is present, Universal Feed Parser detects this and makes the guid available in d.entries[i].link.

Maybe that's why I can access entry.link and entry.description without any error, though print entry.keys() gives

['summary_detail', 'published_parsed', 'links', 'title', 'summary', 'guidislink', 'title_detail', 'link', 'published', 'id']

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow