urllib.urlopen funktioniert, aber urllib2.urlopen nicht

https://stackoverflow.com/questions/201515

03-07-2019
|

Frage

Ich habe eine einfache Website, die ich teste. Es läuft auf Localhost und ich kann in meinem Webbrowser darauf zugreifen. Die Indexseite ist einfach das Wort "Laufen". urllib.urlopen Wird die Seite aber erfolgreich lesen urllib2.urlopen wird nicht. Hier ist ein Skript, das das Problem demonstriert (dies ist das tatsächliche Skript und keine Vereinfachung eines anderen Testskripts):

import urllib, urllib2
print urllib.urlopen("http://127.0.0.1").read()  # prints "running"
print urllib2.urlopen("http://127.0.0.1").read() # throws an exception

Hier ist die Stapelspur:

Traceback (most recent call last):
  File "urltest.py", line 5, in <module>
    print urllib2.urlopen("http://127.0.0.1").read()
  File "C:\Python25\lib\urllib2.py", line 121, in urlopen
    return _opener.open(url, data)
  File "C:\Python25\lib\urllib2.py", line 380, in open
    response = meth(req, response)
  File "C:\Python25\lib\urllib2.py", line 491, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python25\lib\urllib2.py", line 412, in error
    result = self._call_chain(*args)
  File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
    result = func(*args)
  File "C:\Python25\lib\urllib2.py", line 575, in http_error_302
    return self.parent.open(new)
  File "C:\Python25\lib\urllib2.py", line 380, in open
    response = meth(req, response)
  File "C:\Python25\lib\urllib2.py", line 491, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python25\lib\urllib2.py", line 418, in error
    return self._call_chain(*args)
  File "C:\Python25\lib\urllib2.py", line 353, in _call_chain
    result = func(*args)
  File "C:\Python25\lib\urllib2.py", line 499, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 504: Gateway Timeout

Irgendwelche Ideen? Am Ende brauche ich einige der fortgeschritteneren Funktionen von urllib2, Ich möchte also nicht nur auf die Verwendung zurückgreifen urllib, Außerdem möchte ich dieses Problem verstehen.

Lösung

Klingt so, als hätten Sie Proxy -Einstellungen definiert, dass Urllib2 sich aufnimmt. Wenn es versucht, "127.0.0.01/" zu proxy, gibt der Proxy auf und gibt einen 504 -Fehler zurück.

Aus Obskure python urllib2 proxy gotcha:

proxy_support = urllib2.ProxyHandler({})
opener = urllib2.build_opener(proxy_support)
print opener.open("http://127.0.0.1").read()

# Optional - makes this opener default for urlopen etc.
urllib2.install_opener(opener)
print urllib2.urlopen("http://127.0.0.1").read()

Andere Tipps

Hat das Aufrufen von URLIB2.Open zuerst von urllib.open die gleichen Ergebnisse? Ich frage mich nur, ob der erste Aufruf zum Öffnen dazu führt, dass der HTTP -Server damit beschäftigt ist, das Auszeits zu verursachen?

Ich weiß nicht, was los ist, aber Sie finden dies möglicherweise hilfreich, um es herauszufinden:

>>> import urllib2
>>> urllib2.urlopen('http://mit.edu').read()[:10]
'<!DOCTYPE '
>>> urllib2._opener.handlers[1].set_http_debuglevel(100)
>>> urllib2.urlopen('http://mit.edu').read()[:10]
connect: (mit.edu, 80)
send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: mit.edu\r\nConnection: close\r\nUser-Agent: Python-urllib/2.5\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 14 Oct 2008 15:52:03 GMT
header: Server: MIT Web Server Apache/1.3.26 Mark/1.5 (Unix) mod_ssl/2.8.9 OpenSSL/0.9.7c
header: Last-Modified: Tue, 14 Oct 2008 04:02:15 GMT
header: ETag: "71d3f96-2895-48f419c7"
header: Accept-Ranges: bytes
header: Content-Length: 10389
header: Connection: close
header: Content-Type: text/html
'<!DOCTYPE '

urllib.urlopen () wirft die folgende Anforderung auf den Server:

GET / HTTP/1.0
Host: 127.0.0.1
User-Agent: Python-urllib/1.17

während urllib2.urlopen () dies auswirkt:

GET / HTTP/1.1
Accept-Encoding: identity
Host: 127.0.0.1
Connection: close
User-Agent: Python-urllib/2.5

Ihr Server versteht also entweder HTTP/1.1 oder die zusätzlichen Headerfelder.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow