feedparser with timeout

https://stackoverflow.com/questions/9772691

25-05-2021
|

سؤال

My code got stuck on this function call:

feedparser.parse("http://...")

This worked before. The url is even not possible to open in the browser. How would you cure this case? Is there a timeout possibility? I'd like to continue as if nothing would happen (only with printing some message or log this issue)

المحلول

You can specify timeout globally using socket.setdefaulttimeout().

The timeout may limit how long an individual socket operation may last -- feedparser.parse() may perform many socket operations and therefore the total time spent on dns, establishing the tcp connection, sending/receiving data may be much longer. See Read timeout using either urllib2 or any other http library.

نصائح أخرى

Use Python requests library for network IO, feedparser for parsing only:

# Do request using requests library and timeout
try:
    resp = requests.get(rss_feed, timeout=20.0)
except requests.ReadTimeout:
    logger.warn("Timeout when reading RSS %s", rss_feed)
    return

# Put it to memory stream object universal feedparser
content = BytesIO(resp.content)

# Parse content
feed = feedparser.parse(content)

According to the author's recommendation[1], you should use requests library to do http request, and parse result to feedparser.

[1] https://github.com/kurtmckee/feedparser/pull/80

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow