Question

I'm trying to check a few URLs to see if they come back as OK before I further manipulate them, I have a list of URLs in self.myList, which then runs these through the httplib HTTP Connection to get the response, however I get a load of errors from the httplib in cmd.

the code works, as I've tested with the below and it correctly comes back and sets the value in a wx.TextCtrl:

#for line in self.myList:
            conn = httplib.HTTPConnection("www.google.com")
            conn.request("HEAD", "/")
            r1 = conn.getresponse()
            r1 = r1.status, r1.reason
            self.urlFld.SetValue(str(r1))

It just doesn't seem to work when I pass it more than 1 URL from myList.

for line in self.myList:
            conn = httplib.HTTPConnection(line)
            conn.request("HEAD", "/")
            r1 = conn.getresponse()
            r1 = r1.status, r1.reason
            self.urlFld.SetValue(line + "\t\t" + str(r1))

The errors I get on cmd are

Traceback (most recent call last):
File "gui_texteditor_men.py", line 96, in checkBtnClick
conn.request("HEAD", "/")
File "C:\Python27\lib\httplib.py", line 958, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 992, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 954, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 814, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 776, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 757, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno 11004] getaddrinfo failed

Edit, updated code using urlparse. I have imported urlparse.

for line in self.myList:
            url = urlparse.urlparse(line)
            conn = httplib.HTTPConnection(url.hostname)
            conn.request("HEAD", url.path)
            r1 = conn.getresponse()
            r1 = r1.status, r1.reason
            self.urlFld.AppendText(url.hostname + "\t\t" + str(r1))

with traceback,

C:\Python27\Coding>python gui_texteditor_men.py
Traceback (most recent call last):
File "gui_texteditor_men.py", line 97, in checkBtnClick
conn = httplib.HTTPConnection(url.hostname)
File "C:\Python27\lib\httplib.py", line 693, in __init__
self._set_hostport(host, port)
File "C:\Python27\lib\httplib.py", line 712, in _set_hostport
i = host.rfind(':')
AttributeError: 'NoneType' object has no attribute 'rfind'

I now have www.google.com and www.bing.com in a .txt file, when it throws this error.

Edit 2 @ Aya,

looks like it failed due to the "\n" between the 2 URLs. I thought I coded it to remove the "\n" with .strip() but seems it didnt have any effect.

Failed on u'http://www.google.com\nhttp://www.bing.com'
Traceback (most recent call last):
File "gui_texteditor_men.py", line 99, in checkBtnClick
conn.request("HEAD", url.path)
File "C:\Python27\lib\httplib.py", line 958, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 992, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 954, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 814, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 776, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 757, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno 11004] getaddrinfo failed

I took another look at my .strip() when I open the file,

if dlg.ShowModal() == wx.ID_OK:
        directory, filename = dlg.GetDirectory(), dlg.GetFilename()
        self.filePath = '/'.join((directory, filename))
        self.fileTxt.SetValue(self.filePath)
        self.urlFld.LoadFile(self.filePath)
        self.myList = self.urlFld.GetValue().strip()

and now it traceback errors with "Failed on u'h'"

Thanks

Was it helpful?

Solution

If self.myList contains a list of URLs, you can't use them directly in the HTTPConnection constructor like you do here...

for line in self.myList:
    conn = httplib.HTTPConnection(line)
    conn.request("HEAD", "/")

The HTTPConnection constructor should only be passed the hostname part of the URL, and the request method should be given the path part. You'll need to parse the URL with something like...

import urlparse

for line in self.myList:
    url = urlparse.urlparse(line)
    conn = httplib.HTTPConnection(url.hostname)
    conn.request("HEAD", url.path)

Update

Can you change the code to...

for line in self.myList:
    try:
        url = urlparse.urlparse(line)
        conn = httplib.HTTPConnection(url.hostname)
        conn.request("HEAD", url.path)
        r1 = conn.getresponse()
        r1 = r1.status, r1.reason
        self.urlFld.AppendText(url.hostname + "\t\t" + str(r1))
    except:
        print 'Failed on %r' % line
        raise

...and include the full output of running it?

Update #2

I'm not quite sure what self.fileTxt and self.urlFld are supposed to do, but if you're just reading lines from self.filePath, you only need...

if dlg.ShowModal() == wx.ID_OK:
    directory, filename = dlg.GetDirectory(), dlg.GetFilename()
    self.filePath = '/'.join((directory, filename))
    self.myList = [line.strip() for line in open(self.filePath, 'r').readlines()]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top