Question

I'm trying to check a webpage's status with Python. I've done timers to test, but nothing yields anything much better than the other. The worst and best only differ by 20%. I really just need the response code, not the HTML source. There are 3 response codes that I will handle: 200, 403, 404.

Method 1 is mine but the others were found here: Checking if a website is up via Python

Method 1: Right now, I'm using mechanize to open the URL with a try and except. If it's a 200, it'll go through fine but if it's 403/404, it'll run the except. This works fine, but it's not very fast. The average speed is 0.00276

Method 2: Using urllib, I get about the same time as python. The average speed is 0.00227. Here's the code for that, it's just a one liner.

print urllib.urlopen("http://www.stackoverflow.com").getcode()

Method 3: I think this httplib method would be fastest, but it only checks domains, it doesn't check individual pages of a domain, so it didn't work in my case. The code for that is:

conn = httplib.HTTPConnection("www.python.org")
conn.request("HEAD", "/")
r1 = conn.getresponse()
print r1.status, r1.reason

Method 4: This method uses requests.head and it has an average speed of 0.00246. The code is:

r = requests.head("http://www.stackoverflow.com")
print r

Does anyone know a more efficient way of checking a webpage's status in python?

Was it helpful?

Solution

The three libraries you've mentioned pretty well cover all the immediate options. Requests.py could be a #4 candidate.

Note that Mechanize wraps URLLib2 while Requests makes use of URLLib3.

As the comments on the question, these are all mature libraries so it's probably unlikely you'll find performance improvements in other libraries or by re-implementing yourself.

Still, if that's your goal then that's probably the direction to head.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top