feedparser with timeout

https://stackoverflow.com/questions/9772691

25-05-2021
|

题

My code got stuck on this function call:

feedparser.parse("http://...")

This worked before. The url is even not possible to open in the browser. How would you cure this case? Is there a timeout possibility? I'd like to continue as if nothing would happen (only with printing some message or log this issue)

解决方案

You can specify timeout globally using socket.setdefaulttimeout().

The timeout may limit how long an individual socket operation may last -- feedparser.parse() may perform many socket operations and therefore the total time spent on dns, establishing the tcp connection, sending/receiving data may be much longer. See Read timeout using either urllib2 or any other http library.

其他提示

Use Python requests library for network IO, feedparser for parsing only:

# Do request using requests library and timeout
try:
    resp = requests.get(rss_feed, timeout=20.0)
except requests.ReadTimeout:
    logger.warn("Timeout when reading RSS %s", rss_feed)
    return

# Put it to memory stream object universal feedparser
content = BytesIO(resp.content)

# Parse content
feed = feedparser.parse(content)

According to the author's recommendation[1], you should use requests library to do http request, and parse result to feedparser.

[1] https://github.com/kurtmckee/feedparser/pull/80

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow