If the only problem is releasing the semaphore, why don't you use the try/finally
syntax?
try:
sem.acquire()
eventlet.sleep(HIT_DELAY)
lphtml = urllib2.urlopen(list_page_url).read()
finally:
sem.release()
Question
I have a long running process that crashes about every three days because during an HTTP connection, after the connection is made but before any data is received, httplib throws BadStatusLine. I've tried wrapping my call, but the exception just causes at stack trace and stops the process anyways.
#supporting code included for clarity
from httplib import BadStatusLine, HTTPException
import eventlet
sem = eventlet.semaphore.Semaphore(SIMULTENEOUS)
#problem code, running in one of many qthreads downloading various pages.
try:
sem.acquire()
eventlet.sleep(HIT_DELAY)
lphtml = urllib2.urlopen(list_page_url).read()
sem.release()
except (urllib2.URLError, urllib2.HTTPError, HTTPException, BadStatusLine) as e:
sem.release()
pipe.log.error("Could not download product list page %s\n%s" % (str(e), list_page_url))
continue
I'm using a semaphore because I don't want my code hitting the site more than once per second, (but I don't want to git rid of eventlet for reasons elsewhere in the code.
Eventually the call to urllib2.urlopen will throw BadStatusLine, but it won't be caught and the semaphore will never be released. This is the stack-trace produced.
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.16-py2.6.egg/eventlet/greenpool.py", line 80, in _spawn_n_impl
func(*args, **kwargs)
File "/home/myself/secret_filename.py", line 52, in poll_feed_hourly
lphtml = urllib2.urlopen(list_page_url).read()
File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/python2.6/urllib2.py", line 1170, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.6/urllib2.py", line 1143, in do_open
r = h.getresponse()
File "/usr/lib/python2.6/httplib.py", line 990, in getresponse
response.begin()
File "/usr/lib/python2.6/httplib.py", line 391, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.6/httplib.py", line 355, in _read_status
raise BadStatusLine(line)
BadStatusLine
Could it be my bizarre use of qthreads that causes BadStatusLine to never reach the catch statement? Is there some place I could insert a timeout to cause the except block to eventually be reached?
Solution
If the only problem is releasing the semaphore, why don't you use the try/finally
syntax?
try:
sem.acquire()
eventlet.sleep(HIT_DELAY)
lphtml = urllib2.urlopen(list_page_url).read()
finally:
sem.release()
OTHER TIPS
Try to use
from eventlet.green.httplib import HTTPException
instead of
from httplib import BadStatusLine, HTTPException
Note: httplib.BadStatusLine is subclass of httplib.HTTPException (http://docs.python.org/2/library/httplib.html#httplib.BadStatusLine), so BadStatusLine will be caught too.