Question

I think I've discovered a problem with the Requests library's handling of redirects when using HTTPS. As far as I can tell, this is only a problem when the server redirects the Requests client to another HTTPS resource.

I can assure you that the proxy I'm using supports HTTPS and the CONNECT method because I can use it with a browser just fine. I'm using version 2.1.0 of the Requests library which is using 1.7.1 of the urllib3 library.

I watched the transactions in wireshark and I can see the first transaction for https://www.paypal.com/ but I don't see anything for https://www.paypal.com/home. I keep getting timeouts when debugging any deeper in the stack with my debugger so I don't know where to go from here. I'm definitely not seeing the request for /home as a result of the redirect. So it must be erroring out in the code before it gets sent to the proxy.

I want to know if this truly is a bug or if I am doing something wrong. It is really easy to reproduce so long as you have access to a proxy that you can send traffic through. See the code below:

import requests

proxiesDict = {
    'http': "http://127.0.0.1:8080",
    'https': "http://127.0.0.1:8080"
}

# This fails with "requests.exceptions.ProxyError: Cannot connect to proxy. Socket error: [Errno 111] Connection refused." when it tries to follow the redirect to /home
r = requests.get("https://www.paypal.com/", proxies=proxiesDict)
# This succeeds.
r = requests.get("https://www.paypal.com/home", proxies=proxiesDict)

This also happens when using urllib3 directly. It is probably mainly a bug in urllib3, which Requests uses under the hood, but I'm using the higher level requests library. See below:

proxy = urllib3.proxy_from_url('http://127.0.0.1:8080/')

# This fails with the same error as above.
res = proxy.urlopen('GET', https://www.paypal.com/)
# This succeeds
res = proxy.urlopen('GET', https://www.paypal.com/home)

Here is the traceback when using Requests:

Traceback (most recent call last):
  File "tests/downloader_tests.py", line 22, in test_proxy_https_request
    r = requests.get("https://www.paypal.com/", proxies=proxiesDict)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 382, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 505, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 167, in resolve_redirects
    allow_redirects=False,
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 485, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 375, in send
    raise ProxyError(e)
requests.exceptions.ProxyError: Cannot connect to proxy. Socket error: [Errno 111] Connection refused.

Update:

The problem only seems to happen with a 302 (Found) redirect not with the normal 301 redirects (Moved Permanently). Also, I noticed that with the Chrome browser, Paypal doesn't return a redirect. I do see the redirect when using Requests - even though I'm borrowing Chrome's User Agent for this experiment. I'm looking for more URLs that return a 302 in order to get more data points.

I need this to work for all URLs or at least understand why I'm seeing this behavior.

Was it helpful?

Solution

This is a bug in urllib3. We're tracking it as urllib3 issue #295.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top