Question

I have a web server based on paste.httpserver as an adapater between HTTP and WSGI. When I do performance measurements with httperf, I can do over 1,000 requests per second if I start a new request each time using --num-conn. If I instead reuse the connection using --num-call then I get about 11 requests per second, 1/100th of the speed.

If I try ab I get a timeout.

My tests are

% ./httperf --server localhost --port 8080 --num-conn 100
...
Request rate: 1320.4 req/s (0.8 ms/req)
...

and

% ./httperf --server localhost --port 8080 --num-call 100
...
Request rate: 11.2 req/s (89.4 ms/req)
...

Here's a simple reproducible server

from paste import httpserver

def echo_app(environ, start_response):
    n = 10000
    start_response("200 Ok", [("Content-Type", "text/plain"),
                              ("Content-Length", str(n))])
    return ["*" * n]

httpserver.serve(echo_app, protocol_version="HTTP/1.1")

It's a multi-threaded server, which is hard to profile. Here's a variation which is single threaded:

from paste import httpserver

class MyHandler(httpserver.WSGIHandler):
    sys_version = None
    server_version = "MyServer/0.0"
    protocol_version = "HTTP/1.1"

    def log_request(self, *args, **kwargs):
        pass


def echo_app(environ, start_response):
    n = 10000
    start_response("200 Ok", [("Content-Type", "text/plain"),
                              ("Content-Length", str(n))])
    return ["*" * n]

# WSGIServerBase is single-threaded
server = httpserver.WSGIServerBase(echo_app, ("localhost", 8080), MyHandler)
server.handle_request()

Profiling that with

% python2.6 -m cProfile -o paste.prof paste_slowdown.py

and hitting it with

%httperf --client=0/1 --server=localhost --port=8080 --uri=/ \ 
   --send-buffer=4096 --recv-buffer=16384 --num-conns=1 --num-calls=500

I get a profile like

>>> p=pstats.Stats("paste.prof")
>>> p.strip_dirs().sort_stats("cumulative").print_stats()
Sun Nov 22 21:31:57 2009    paste.prof

         109749 function calls in 46.570 CPU seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   46.571   46.571 {execfile}
        1    0.001    0.001   46.570   46.570 paste_slowdown.py:2(<module>)
        1    0.000    0.000   46.115   46.115 SocketServer.py:250(handle_request)
        1    0.000    0.000   44.675   44.675 SocketServer.py:268(_handle_request_noblock)
        1    0.000    0.000   44.675   44.675 SocketServer.py:301(process_request)
        1    0.000    0.000   44.675   44.675 SocketServer.py:318(finish_request)
        1    0.000    0.000   44.675   44.675 SocketServer.py:609(__init__)
        1    0.000    0.000   44.675   44.675 httpserver.py:456(handle)
        1    0.001    0.001   44.675   44.675 BaseHTTPServer.py:325(handle)
      501    0.006    0.000   44.674    0.089 httpserver.py:440(handle_one_request)
     2001    0.020    0.000   44.383    0.022 socket.py:373(readline)
      501   44.354    0.089   44.354    0.089 {method 'recv' of '_socket.socket' objects}
        1    1.440    1.440    1.440    1.440 {select.select}
         ....

You can see that nearly all the time is in a recv.

I decided to bail on httpref and write my own HTTP/1.1-with-keep-alive request and send it using netcat:

GET / HTTP/1.1
Location: localhost
Connection: Keep-Alive
Content-Length: 0

GET / HTTP/1.1
Location: localhost
Connection: Keep-Alive
Content-Length: 0

 ... repeat 97 more times, to have 99 keep-alives in total ...

GET / HTTP/1.1
Location: localhost
Connection: Close
Content-Length: 0

which I sent with

nc localhost 8080 < ~/src/send_to_paste.txt

Total time for 100 requests was 0.03 seconds, so it's very good performance.

This suggests that httperf is doing something wrong (but it's a widely used and respected piece of code), so I tried 'ab'

% ab -n 100 -k localhost:8080/
This is ApacheBench, Version 1.3d <$Revision: 1.73 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)...
Server timed out

: Operation now in progress

Instrumenting the server, it handles one request and is waiting for the second.

Any idea of what's going on?

Was it helpful?

Solution

After some effort, it seems to be either Nagle's algorithm or the delayed ACK, or the interactions between them. It goes away if I do something like

server.socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

How did I track it down? First, I instrumented every 'recv' in socket.py, so I could figure out which recv was waiting. I would see about 5 recv's out of 11 had a delay of almost 200ms. I couldn't figure out why there was any delay. I then used Wireshark to watch the messages and noticed that the it was actually the send from the server to the client which had the delay. That meant something in the TCP layer in the outgoing messages from my client.

A friend suggested the obvious, and I searched for "200ms socket delay" and found descriptions of this problem.

The paste trac report is at http://trac.pythonpaste.org/pythonpaste/ticket/392 along with a patch which enables TCP_NODELAY when the handler uses HTTP/1.1.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top