Question

import http.client
import csv

def http_get(url, path, headers):
    try:
        conn = http.client.HTTPConnection(url)
        print ('Connecting to ' + url)
        conn.request(url, path, headers=headers)
        resp = conn.getresponse()
        if resp.status<=400:
            body = resp.read()
            print ('Reading Source...')
    except Exception as e:
        raise Exception('Connection Error: %s' % e)
        pass
    finally:
        conn.close()
        print ('Connection Closed')

    if resp.status >= 400:
        print (url)
        raise ValueError('Response Error: %s, %s, URL: %s' % (resp.status, resp.reason,url))
    return body


with open('domains.csv','r') as csvfile:
    urls = [row[0] for row in csv.reader(csvfile)]

L = ['Version 0.7','Version 1.2','Version 1.5','Version 2.0','Version 2.1','Version 2.3','Version 2.5','Version 2.6','Version 2.7','Version 2.8','Version 2.9','Version 2.9','Version 3.0','Version 3.1','Version 3.2','Version 3.3','Version 3.4','Version 3.5.1','Version 3.5.2']
PATH = '/'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
HEADERS = {'User-Agent': user_agent}

for url in urls:        
    HOST = url

    print ('Testing WordPress Installation on ' + url)
    http_get(HOST,PATH,HEADERS)

I've been looking at this for a week or two now and I've found similar errors around however I don't get why it works for some sites in the csv file and not others. I checked the server and I saw it was dropping ICMP packets by default so I changed that and now traceroute and ping both get 100% received as opposed to the previous 100% lost. I figured it was related as all sites on that host had the same issue. However my script is still throwing the exception:

mud@alex-BBVM:~/Desktop/scripts$ python3 httpTest.py
Testing WordPress Installation on XXXXX.ie
Connecting to exsite.ie
Reading Source...
Connection Closed
Testing WordPress Installation on AAAAAA.com
Connecting to AAAAA.com
Reading Source...
Connection Closed
Testing WordPress Installation on YYYYY.ie
Connecting to YYYYY.ie
Reading Source...
Connection Closed
Testing WordPress Installation on CCCCC.ie
Connecting to CCCCCC.ie
Reading Source...
Connection Closed
Testing WordPress Installation on DDDDDDD.ie
Connecting to DDDDDDD.ie
Connection Closed
Traceback (most recent call last):
  File "httpTest.py", line 9, in http_get
    resp = conn.getresponse()
  File "/usr/lib/python3.2/http/client.py", line 1049, in getresponse
    response.begin()
  File "/usr/lib/python3.2/http/client.py", line 346, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.2/http/client.py", line 328, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: <html>


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "httpTest.py", line 38, in <module>
    http_get(HOST,PATH,HEADERS)
  File "httpTest.py", line 14, in http_get
    raise Exception('Connection Error: %s' % e)
Exception: Connection Error: <html>

I've obviously replaced the urls with placeholders as they are client addresses and I'd rather not post them here.

Anyways, any insights or help is appreciated.

I've read the documentation for http.client and it's relevant exceptions however I can't seem to extract a solution from what I gleened from that.

Thanks!

Était-ce utile?

La solution

First off, I suggest that you always read everything from the HTTPResponse object before calling conn.close(). Even 404 responses contain a document.

I'm rather confused by your tracebacks, as far as I can see the http.client.BadStatusLine should have been hidden by your except Exception.

Typically an except Exception clause isn't a good idea as unless you re-raise the same exception (you are not) you may be masking underlying problems. In any case, it's the first thing that should go when the code isn't working as expected.

Additionally the output you've provided doesn't seem to match the code you've provided.

Specifically, according to the traceback:

Connection Closed
Traceback (most recent call last):
  File "httpTest.py", line 9, in http_get
    resp = conn.getresponse()

The code for this has a print ('Connecting to ' + url) immediately before:

print ('Connecting to ' + url)
conn.request(url, path, headers=headers)
resp = conn.getresponse()

But the line preceding the traceback in the output is Connection Closed.


Update

Ignoring the confusing execution order of the try / finally.

http.client.BadStatusLine is raise when the initial response is not something like HTTP/1.1 200 OK. In this particular case, it is <html> instead.

Either the server is returning a documentwithout an HTTP header. Or it's an unexpected behaviour from the code.

I repeat what I've already said: always read everything from the HTTPResponse object.

A packet capture will confirm what's going over the wire with this server.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top