Is there a way to verify that urllib2 responses are complete? I occasionally receive an Httplib.IncompleteRead error.

https://stackoverflow.com/questions/16501678

21-04-2022
|

Pergunta

I have a function which opens a page via urllib2 to extract some data. It runs fine 80% of the time, but 20% of the time, I'll receive an IncompleteRead exception.

TraceBack

Traceback (most recent call last):
  File "test.py", line 380, in <module>
    main()
  File "test.py", line 109, in main
    soups.append(BeautifulSoup(out_queue.get().read()))
  File "c:\python27\lib\socket.py", line 351, in read
    data = self._sock.recv(rbufsize)
  File "c:\python27\lib\httplib.py", line 541, in read
    return self._read_chunked(amt)
  File "c:\python27\lib\httplib.py", line 601, in _read_chunked
    value.append(self._safe_read(chunk_left))
  File "c:\python27\lib\httplib.py", line 649, in _safe_read
    raise IncompleteRead(''.join(s), amt)
httplib.IncompleteRead: IncompleteRead(958 bytes read, 678 more expected)

I'm opening the page via the basic,

response = urllib2.urlopen('the_url')

and then later in the program converting it to a BeautifulSoup object.

Is there a way to catch the problem when the initial request is made with urllib2? Some way to verify that the data is "whole" before I try to do something with it?

Solução

You could read it all at once:

try:
    response_data = urllib2.urlopen('the_url').read()
except httplib.IncompleteRead:
    pass # handle the error

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow