Question

I have a python client which pushes a great deal of data through the standard library's httlib. Users are complainging that the application is slow. I suspect that this may be partly due to the HTTP client I am using.

Could I improve performance by replacing httplib with something else?

I've seen that twisted offers a HTTP client. It seems to be very basic compared to their other protocol offerings.

PyCurl might be a valid alternative, however it's use seems to be very un-pythonic, on the other hand if it's performance is really good then I can put up with a bit of un-pythonic code.

So if you have experience of better HTTP client libraries of python please tell me about it. I'd like to know what you thought of the performance relative to httplib and what you thought of the quality of implementation.

UPDATE 0: My use of httplib is actually very limited - the replacement needs to do the following:

conn = httplib.HTTPConnection(host, port)
conn.request("POST", url, params, headers)
compressedstream = StringIO.StringIO(conn.getresponse().read())

That's all: No proxies, redirection or any fancy stuff. It's plain-old HTTP. I just need to be able to do it as fast as possible.

UPDATE 1: I'm stuck with Python2.4 and I'm working on Windows 32. Please do not tell me about better ways to use httplib - I want to know about some of the alternatives to httplib.

Was it helpful?

Solution

Often when I've had performance problems with httplib, the problem hasn't been with the httplib itself, but with how I'm using it. Here are a few common pitfalls:

(1) Don't make a new TCP connection for every web request. If you are making lots of request to the same server, instead of this pattern:

    conn = httplib.HTTPConnection("www.somewhere.com")
    conn.request("GET", '/foo')
    conn = httplib.HTTPConnection("www.somewhere.com")
    conn.request("GET", '/bar')
    conn = httplib.HTTPConnection("www.somewhere.com")
    conn.request("GET", '/baz')

Do this instead:

    conn = httplib.HTTPConnection("www.somewhere.com")
    conn.request("GET", '/foo')
    conn.request("GET", '/bar')
    conn.request("GET", '/baz')

(2) Don't serialize your requests. You can use threads or asynccore or whatever you like, but if you are making multiple requests from different servers, you can improve performance by running them in parallel.

OTHER TIPS

Users are complainging that the application is slow. I suspect that this may be partly due to the HTTP client I am using.

Could I improve performance by replacing httplib with something else?

Do you suspect it or are you sure that that it's httplib? Profile before you do anything to improve the performance of your app.

I've found my own intuition on where time is spent is often pretty bad (given that there isn't some code kernel executed millions of times). It's really disappointing to implement something to improve performance then pull up the app and see that it made no difference.

If you're not profiling, you're shooting in the dark!

PyCurl is awesome, and extremely high performance.

httplib2 is another option: http://code.google.com/p/httplib2/

I have never benchmarked or profiled it in comparison to httplib, but I would also be interested in any findings there.


Dec. 2012 update: I no longer use httplib2. now using Requests: HTTP For Humans, for any http with Python.

You seem to assume its the library. Its open source, so it would be worth checking the code to see if it is.

You mention that you're sending a lot of data over HTTP. The inefficieny might be because of the library, but HTTP isn't the most efficient protocol for sending large amounts of data. Then again, it could be the simple use of the library (are you sending a big string or list, or using a stream or generators?).

As others answered httplib2 is a good alternative because it handles headers properly and can cache responses, but I doubt this would help in POST performance.

An alternative that might actually give you a performance boost for POST, especially on Windows, is the new HTTP 1.1 client in Twisted.web

httplib2 is a very good option. Joe Gregorio has fixed many bugs of httplib.

It works on my windows machine: With Py 2.3 (without IPv6 support) this is only the IPv4 address, but with Py 2.4-2.6 the order is (on my Win XP host) the IPv6 address first, then the IPv4 address. Since the IPv6 address is checked first, this gives a timeout and causes the slow connect() call.

I have only changed "localhost" to 127.0.0.1 and it started working 10 times faster (from 1087ms to 87ms). Solution from http://www.velocityreviews.com/forums/t668272-problem-with-slow-httplib-connections-on-windows-and-maybe-otherplatforms.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top