Question

I want to check whether a particular url exists or not.

I came across two methods.

url = "http://www.google.com"

1.

import urllib2
response = urllib2.urlopen(url)
response.code  # check what is the response code

2.

import httplib 
conn = httplib.HTTPConnection(url) 
conn.request('HEAD', '/') 
response = conn.getresponse() 
if response.status == 200: # check the status code
    # do something

Though both will solve my purpose, but which one is a better method to achieve this purpose.

Thanks in advance for help.

Was it helpful?

Solution

If you formulated your question correctly, then neither method is perfect.

The big problem is that you said "url", but you only check for the scheme "http". URLs however can have different schemes:

ftp://ftp.funet.fi/pub/standards/RFC/rfc959.txt

file:///home/somesh/.bashrc

http://www.google.com

"httplib" is not useful for these checks, since it can only handle the "http" scheme. "urllib2" on the other hand can handle all of the schemes I mentioned but the response.code is not what you have to check. Instead, you should catch exceptions which will be raised when the resource is not available: HTTPError or URLError in these cases.

OTHER TIPS

You can try something along those lines, the point being, that you usually also want to handle the errors you get, if there is a problem with fetching the url.

In [4]: import urllib2

In [5]: def test(url):
   ...:     try:
   ...:         response = urllib2.urlopen(url)
   ...:     except urllib2.HTTPError as e:
   ...:         return e.code,None
   ...:     return response.code,response

In [6]: test('http://www.google.com')
Out[6]: 
(200,
 <addinfourl at 154469068 whose fp = <socket._fileobject object at 0x92caa2c>>)

In [7]: test('http://www.google.com/foobar')
Out[7]: (404, None)

In reality you would also need to handle urllib2.URLError:

In [10]: def test(url):
    ...:     try:
    ...:         response = urllib2.urlopen(url)
    ...:     except urllib2.HTTPError as err:
    ...:         return err.code, None
    ...:     except urllib2.URLError as err:
    ...:         return err.reason, None
    ...:     return response.code,response

In [11]: test('http://www.google.foo')
Out[11]: (socket.gaierror(-2, 'Name or service not known'), None)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top