Question

I am trying to retrieve a webpage which needs to be accessed from behind a proxy and additionally needs HTTP Authentication:

$ wget -d --user=atwood --ask-password http://example.com/admin/admin.php

This works fine, and will I paste the HTTP headers (request and response below).

Retrieving the same page with python-requests returns a 404 error:

Here is the Python code, which was preceded by the terrific method that user Inactivist posted for debugging the requests library:

url = 'http://example.com/admin/admin.php'
proxy_config = {
    'http': '1.2.3.4',
    'https': '1.2.3.4',
    'ftp': '1.2.3.4'
}
head = {
    'User-Agent': 'Wget/1.13.4 (linux-gnu)',
    'Connection': 'Close',
    'Proxy-Connection': 'Keep-Alive'
}

response = requests.get(url, auth=('atwood', 'hunter2'), proxies=proxy_config, headers=head)

print("Status code: %s" % (response.status_code, ))
print("URL: %s" % (response.url, ))
print(pformat(response.text))

Here are the wget HTTP headers (request and response), which do in fact return the requested page properly:

$ export http_proxy=http://1.2.3.4:3128
$ wget -d --user=atwood --ask-password  http://example.com/admin/admin.php
Setting --user (user) to atwood
Setting --ask-password (askpassword) to 1
Password for user `atwood': 
DEBUG output created by Wget 1.13.4 on linux-gnu.

URI encoding = `UTF-8'
URI encoding = `UTF-8'
--2014-01-07 11:15:59--  http://example.com/admin/admin.php
Host `example.com' has not issued a general basic challenge.
Connecting to 1.2.3.4:3128... connected.
Created socket 3.
Releasing 0x000000000159bf20 (new refcount 0).
Deleting unused 0x000000000159bf20.

---request begin---
GET http://example.com/admin/admin.php HTTP/1.1
User-Agent: Wget/1.13.4 (linux-gnu)
Accept: */*
Host: example.com
Connection: Close
Proxy-Connection: Keep-Alive

---request end---
Proxy request sent, awaiting response... 
---response begin---
HTTP/1.0 401 Unauthorized
Date: Tue, 07 Jan 2014 09:16:00 GMT
Server: Apache/2.2.21 (Linux/SUSE)
X-Powered-By: PHP/5.3.8
WWW-Authenticate: Basic realm="CONTACT-ADMIN"
Content-Length: 43
Content-Type: text/html
X-Cache: MISS from proxyServer
X-Cache-Lookup: MISS from proxyServer:3128
Via: 1.0 proxyServer (squid/3.1.19)
Connection: keep-alive

---response end---
401 Unauthorized
Registered socket 3 for persistent reuse.
Skipping 43 bytes of body: [Login incorrect, please try again: |||BAD|
] done.
Inserted `example.com' into basic_authed_hosts
Reusing existing connection to 1.2.3.4:3128.
Reusing fd 3.

---request begin---
GET http://example.com/admin/admin.php HTTP/1.1
User-Agent: Wget/1.13.4 (linux-gnu)
Accept: */*
Host: example.com
Connection: Close
Proxy-Connection: Keep-Alive
Authorization: Basic NjY2Njp0cmlwczEyMw==

---request end---
Proxy request sent, awaiting response...
---response begin---
HTTP/1.0 200 OK
Date: Tue, 07 Jan 2014 09:16:00 GMT
Server: Apache/2.2.21 (Linux/SUSE)
X-Powered-By: PHP/5.3.8
Cache-Control: no-cache, must-revalidate
Pragma: no-cache
Content-Type: text/html; charset=utf-8
X-Cache: MISS from proxyServer
X-Cache-Lookup: MISS from proxyServer:3128
Via: 1.0 proxyServer (squid/3.1.19)
Connection: close

---response end---
200 OK
URI content encoding = `utf-8'
Length: unspecified [text/html]
Saving to: `admin.php'

    [ <=>                            ] 14,096      --.-K/s   in 0.1s

2014-01-07 11:16:00 (92.8 KB/s) - `admin.php' saved [14096]

You might notice that I have anonymized the URL that I am fetching. In fact, I have triple-checked that the URL which is returning 404 is in fact the same URL as that which works in wget.

Was it helpful?

Solution

It looks like your proxy port in Python is not the same as used for wget (3128 versus the default 8080 I guess).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top