Detecting timeout erros in Python's urllib2 urlopen

https://stackoverflow.com/questions/3515087

29-09-2019
|

Question

I'm still relatively new to Python, so if this is an obvious question, I apologize.

My question is in regard to the urllib2 library, and it's urlopen function. Currently I'm using this to load a large amount of pages from another server (they are all on the same remote host) but the script is killed every now and then by a timeout error (I assume this is from the large requests).

Is there a way to keep the script running after a timeout? I'd like to be able to fetch all of the pages, so I want a script that will keep trying until it gets a page, and then moves on.

On a side note, would keeping the connection open to the server help?

Solution

Next time the error occurs, take note of the error message. The last line will tell you the type of exception. For example, it might be a urllib2.HTTPError. Once you know the type of exception raised, you can catch it in a try...except block. For example:

import urllib2
import time

for url in urls:
    while True:
        try:
            sock=urllib2.urlopen(url)
        except (urllib2.HTTPError, urllib2.URLError) as err:
            # You may want to count how many times you reach here and
            # do something smarter if you fail too many times.
            # If a site is down, pestering it every 10 seconds may not
            # be very fruitful or polite.
            time.sleep(10)
        else:              
            # Success  
            contents=sock.read()
            # process contents
            break                # break out of the while loop

OTHER TIPS

The missing manual of urllib2 might help you

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow