Question

somehow I can't download files trough a proxyserver, and I don't know what i have done wrong. I just get a timeout. Any advice?

import urllib.request

urllib.request.ProxyHandler({"http" : "myproxy:123"})
urllib.request.urlretrieve("http://myfile", "file.file")
Was it helpful?

Solution

You need to use your proxy-object, not just instanciate it (you created an object, but didn't assign it to a variable and therefore can't use it). Try using this pattern:

#create the object, assign it to a variable
proxy = urllib.request.ProxyHandler({'http': '127.0.0.1'})
# construct a new opener using your proxy settings
opener = urllib.request.build_opener(proxy)
# install the openen on the module-level
urllib.request.install_opener(opener)
# make a request
urllib.request.urlretrieve('http://www.google.com')

Or, if you do not need to rely on the std-lib, use requests (this code is from the official documentation):

import requests

proxies = {"http": "http://10.10.1.10:3128",
           "https": "http://10.10.1.10:1080"}

requests.get("http://example.org", proxies=proxies)

OTHER TIPS

urllib reads proxy settings from the system environment.

According to the code snippet in the urllib\request.py, just set http_proxy and https_proxy to the environment variable.

In the meantime, it is also documented here: https://www.cmi.ac.in/~madhavan/courses/prog2-2015/docs/python-3.4.2-docs-html/howto/urllib2.html#proxies

    # Proxy handling
    def getproxies_environment():
    """Return a dictionary of scheme -> proxy server URL mappings.

    Scan the environment for variables named <scheme>_proxy;
    this seems to be the standard convention.  If you need a
    different way, you can pass a proxies dictionary to the
    [Fancy]URLopener constructor.

    """
    proxies = {}
    # in order to prefer lowercase variables, process environment in
    # two passes: first matches any, second pass matches lowercase only
    for name, value in os.environ.items():
        name = name.lower()
        if value and name[-6:] == '_proxy':
            proxies[name[:-6]] = value
    # CVE-2016-1000110 - If we are running as CGI script, forget HTTP_PROXY
    # (non-all-lowercase) as it may be set from the web server by a "Proxy:"
    # header from the client
    # If "proxy" is lowercase, it will still be used thanks to the next block
    if 'REQUEST_METHOD' in os.environ:
        proxies.pop('http', None)
    for name, value in os.environ.items():
        if name[-6:] == '_proxy':
            name = name.lower()
            if value:
                proxies[name[:-6]] = value
            else:
                proxies.pop(name[:-6], None)
    return proxies

If you have to use a SOCKS5 proxy, here's the solution:

import socks
import socket
import urllib.request


proxy_ip = "127.0.0.1"
proxy_port =  1080
socks.set_default_proxy(socks.PROXY_TYPE_SOCKS5, proxy_ip, proxy_port)
socket.socket = socks.socksocket

url = 'https://example.com/foo/bar.jpg'
urllib.request.urlretrieve(url, 'bar.png')

More Info:

This works very well, but if we want to use ProxyHandler, for some reason it errors for SOCKS proxies, even though it should support it.

proxy = urllib.request.ProxyHandler({'socks': 'socks://127.0.0.1:1080'})
opener = urllib.request.build_opener(proxy)
urllib.request.install_opener(opener)
urllib.request.urlretrieve(url, 'bar.png')

class urllib.request.ProxyHandler(proxies=None)

Cause requests to go through a proxy. If proxies is given, it must be a dictionary mapping protocol names to URLs of proxies. The default is to read the list of proxies from the environment variables _proxy. If no proxy environment variables are set, then in a Windows environment proxy settings are obtained from the registry’s Internet Settings section, and in a macOS environment proxy information is retrieved from the System Configuration Framework.

When a SOCKS5 proxy is globally set on my Windows OS, I get this:

>>> urllib.request.getproxies()
{'socks': 'socks://127.0.0.1:1080'}

But it still fails.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top