Question

I have a long running process that crashes about every three days because during an HTTP connection, after the connection is made but before any data is received, httplib throws BadStatusLine. I've tried wrapping my call, but the exception just causes at stack trace and stops the process anyways.

#supporting code included for clarity
from httplib import BadStatusLine, HTTPException
import eventlet
sem = eventlet.semaphore.Semaphore(SIMULTENEOUS)

#problem code, running in one of many qthreads downloading various pages.
try:
    sem.acquire()
    eventlet.sleep(HIT_DELAY)
    lphtml = urllib2.urlopen(list_page_url).read()
    sem.release()
except (urllib2.URLError, urllib2.HTTPError, HTTPException, BadStatusLine) as e:
    sem.release()
    pipe.log.error("Could not download product list page %s\n%s" % (str(e), list_page_url))
    continue

I'm using a semaphore because I don't want my code hitting the site more than once per second, (but I don't want to git rid of eventlet for reasons elsewhere in the code.

Eventually the call to urllib2.urlopen will throw BadStatusLine, but it won't be caught and the semaphore will never be released. This is the stack-trace produced.

Traceback (most recent call last):
  File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.16-py2.6.egg/eventlet/greenpool.py", line 80, in _spawn_n_impl
    func(*args, **kwargs)
  File "/home/myself/secret_filename.py", line 52, in poll_feed_hourly
    lphtml = urllib2.urlopen(list_page_url).read()
  File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 1170, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.6/urllib2.py", line 1143, in do_open
    r = h.getresponse()
  File "/usr/lib/python2.6/httplib.py", line 990, in getresponse
    response.begin()
  File "/usr/lib/python2.6/httplib.py", line 391, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.6/httplib.py", line 355, in _read_status
    raise BadStatusLine(line)
BadStatusLine

Could it be my bizarre use of qthreads that causes BadStatusLine to never reach the catch statement? Is there some place I could insert a timeout to cause the except block to eventually be reached?

Was it helpful?

Solution

If the only problem is releasing the semaphore, why don't you use the try/finally syntax?

try:
    sem.acquire()
    eventlet.sleep(HIT_DELAY)
    lphtml = urllib2.urlopen(list_page_url).read()
finally:
    sem.release()

OTHER TIPS

Try to use

from eventlet.green.httplib import HTTPException

instead of

from httplib import BadStatusLine, HTTPException

Note: httplib.BadStatusLine is subclass of httplib.HTTPException (http://docs.python.org/2/library/httplib.html#httplib.BadStatusLine), so BadStatusLine will be caught too.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top