Question

From Python, I would like to retrieve content from a web site via HTTPS with basic authentication. I need the content on disk. I am on an intranet, trusting the HTTPS server. Platform is Python 2.6.2 on Windows.

I have been playing around with urllib2, however did not succeed so far.

I have a solution running, calling wget via os.system():

wget_cmd = r'\path\to\wget.exe -q -e "https_proxy = http://fqdn.to.proxy:port" --no-check-certificate --http-user="username" --http-password="password" -O path\to\output https://fqdn.to.site/content'

I would like to get rid of the os.system(). Is that possible in Python?

Was it helpful?

Solution

Proxy and https wasn't working for a long time with urllib2. It will be fixed in the next released version of python 2.6 (v2.6.3).

In the meantime you can reimplement the correct support, that's what we did for mercurial: http://hg.intevation.org/mercurial/crew/rev/59acb9c7d90f

OTHER TIPS

Try this (notice that you'll have to fill in the realm of your server also):

import urllib2
authinfo = urllib2.HTTPBasicAuthHandler()
authinfo.add_password(realm='Fill In Realm Here',
                      uri='https://fqdn.to.site/content',
                      user='username',
                      passwd='password')
proxy_support = urllib2.ProxyHandler({"https" : "http://fqdn.to.proxy:port"})
opener = urllib2.build_opener(proxy_support, authinfo)
fp = opener.open("https://fqdn.to.site/content")
open(r"path\to\output", "wb").write(fp.read())

You could try this too: http://code.google.com/p/python-httpclient/

(It also supports the verification of the server certificate.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top