Question

The urllib2 documentation says that timeout parameter was added in Python 2.6. Unfortunately my code base has been running on Python 2.5 and 2.4 platforms.

Is there any alternate way to simulate the timeout? All I want to do is allow the code to talk the remote server for a fixed amount of time.

Perhaps any alternative built-in library? (Don't want install 3rd party, like pycurl)

Was it helpful?

Solution

you can set a global timeout for all socket operations (including HTTP requests) by using:

socket.setdefaulttimeout()

like this:

import urllib2
import socket
socket.setdefaulttimeout(30)
f = urllib2.urlopen('http://www.python.org/')

in this case, your urllib2 request would timeout after 30 secs and throw a socket exception. (this was added in Python 2.3)

OTHER TIPS

With considerable irritation, you can override the httplib.HTTPConnection class that the urllib2.HTTPHandler uses.

def urlopen_with_timeout(url, data=None, timeout=None):

  # Create these two helper classes fresh each time, since
  # timeout needs to be in the closure.
  class TimeoutHTTPConnection(httplib.HTTPConnection):
    def connect(self):
      """Connect to the host and port specified in __init__."""
      msg = "getaddrinfo returns an empty list"
      for res in socket.getaddrinfo(self.host, self.port, 0,
                      socket.SOCK_STREAM): 
        af, socktype, proto, canonname, sa = res
        try:
          self.sock = socket.socket(af, socktype, proto)
          if timeout is not None:
            self.sock.settimeout(timeout)
          if self.debuglevel > 0:
            print "connect: (%s, %s)" % (self.host, self.port)
          self.sock.connect(sa)
        except socket.error, msg:
          if self.debuglevel > 0:
            print 'connect fail:', (self.host, self.port)
          if self.sock:
            self.sock.close()
          self.sock = None
          continue
        break
      if not self.sock:
        raise socket.error, msg

  class TimeoutHTTPHandler(urllib2.HTTPHandler):
    http_request = urllib2.AbstractHTTPHandler.do_request_
    def http_open(self, req):
      return self.do_open(TimeoutHTTPConnection, req)

  opener = urllib2.build_opener(TimeoutHTTPHandler)
  opener.open(url, data)

I think your best choice is to patch (or deploy an local version of) your urllib2 with the change from the 2.6 maintenance branch

The file should be in /usr/lib/python2.4/urllib2.py (on linux and 2.4)

I use httplib from the standard library. It has a dead simple API, but only handles http as you might guess. IIUC urllib uses httplib to implement the http stuff.

You must set timeout in two places.

import urllib2
import socket

socket.setdefaulttimeout(30)
f = urllib2.urlopen('http://www.python.org/', timeout=30)

Well, the way timeout is handled in either 2.4 or 2.6 is the same. If you open the urllib2.py file in 2.6 u would see that it takes an extra argument as timeout and handles it using the socket.defaulttimeout() method as mentioned is answer 1.

So you really need not update your urllib2.py in that case.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top