Question

Does urllib2 in Python 2.6.1 support proxy via https?

I've found the following at http://www.voidspace.org.uk/python/articles/urllib2.shtml:

NOTE

Currently urllib2 does not support fetching of https locations through a proxy. This can be a problem.

I'm trying automate login in to web site and downloading document, I have valid username/password.

proxy_info = {
    'host':"axxx", # commented out the real data
    'port':"1234"  # commented out the real data
}

proxy_handler = urllib2.ProxyHandler(
                 {"http" : "http://%(host)s:%(port)s" % proxy_info})
opener = urllib2.build_opener(proxy_handler,
         urllib2.HTTPHandler(debuglevel=1),urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)

fullurl = 'https://correct.url.to.login.page.com/user=a&pswd=b' # example
req1 = urllib2.Request(url=fullurl, headers=headers)
response = urllib2.urlopen(req1)

I've had it working for similar pages but not using HTTPS and I suspect it does not get through proxy - it just gets stuck in the same way as when I did not specify proxy. I need to go out through proxy.

I need to authenticate but not using basic authentication, will urllib2 figure out authentication when going via https site (I supply username/password to site via url)?

EDIT: Nope, I tested with

   proxies = {
        "http" : "http://%(host)s:%(port)s" % proxy_info,
        "https" : "https://%(host)s:%(port)s" % proxy_info
    }

    proxy_handler = urllib2.ProxyHandler(proxies)

And I get error:

urllib2.URLError: urlopen error [Errno 8] _ssl.c:480: EOF occurred in violation of protocol

Was it helpful?

Solution

I'm not sure Michael Foord's article, that you quote, is updated to Python 2.6.1 -- why not give it a try? Instead of telling ProxyHandler that the proxy is only good for http, as you're doing now, register it for https, too (of course you should format it into a variable just once before you call ProxyHandler and just repeatedly use that variable in the dict): that may or may not work, but, you're not even trying, and that's sure not to work!-)

OTHER TIPS

Fixed in Python 2.6.3 and several other branches:

Incase anyone else have this issue in the future I'd like to point out that it does support https proxying now, make sure the proxy supports it too or you risk running into a bug that puts the python library into an infinite loop (this happened to me).

See the unittest in the python source that is testing https proxying support for further information: http://svn.python.org/view/python/branches/release26-maint/Lib/test/test_urllib2.py?r1=74203&r2=74202&pathrev=74203

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top