Question

I am trying to fetch a page using urllib2.urlopen (actually, I am using mechanize, but this is the method that mechanize calls) When I fetch the page, I am getting incomplete responses; the page gets truncated. However, if I access the non-HTTPS version of the page, I get the complete page.

I am on Arch Linux (3.5.4-1-ARCH x86_64). I am running openssl 1.0.1c. This problem occurs on another Arch Linux machine I own, but not when using Python 3 (3.3.0).

This problem seems to be related to urllib2 not retrieving entire HTTP response.

I tested it on the only online Python interpreter that would let me use urllib2 (Py I/O) and it worked as expected. Here is the code:

import urllib2

u = urllib2.urlopen('https://wa151.avayalive.com/WAAdminPanel/login.aspx?ReturnUrl=%2fWAAdminPanel%2fprivate%2fHome.aspx')

print u.read()[-100:]

The last lines should contain the usual </body></html>.

When I try urllib.urlretrieve on my machines, I get:

ContentTooShortError: retrieval incomplete: got only 11365 out of 13805 bytes

I cannot test urlretrieve on the online interpreter because it will not let users write to temporary files. Later in the evening, I will try fetching the URL from my machine, but from a different location.

Was it helpful?

Solution

I'm getting the same error, using Python 2.7, on a different Linux system:

>>> urllib.urlretrieve('https://wa151.avayalive.com/WAAdminPanel/login.aspx?ReturnUrl=%2fWAAdminPanel%2fprivate%2fHome.aspx')
---------------------------------------------------------------------------
ContentTooShortError                      Traceback (most recent call last)
...
ContentTooShortError: retrieval incomplete: got only 11365 out of 13805 bytes

However, the same operation can be done (and actually works for me) using requests:

>>> import requests
>>> r = requests.get('https://wa151.avayalive.com/WAAdminPanel/login.aspx?ReturnUrl=%2fWAAdminPanel%2fprivate%2fHome.aspx')
>>> with open(somefilepath, 'w') as f:
...     f.write(r.text)

Is that working for you?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top