Question

I own a legacy python application written as CGI. Until now this works OK, but the number of concurrent users will increment largely in the very near future. Here on SO I read: "CGI is great for low-traffic websites, but it has some performance problems for anything else". I know it would have been better to start in another way, but CGI is what is is now.

Could someone point me a direction on to how to keep the CGI performing, without having to rewrite all code?

Was it helpful?

Solution

CGI doesn't scale because each request forks a brand new server process. It's a lot of overhead. mod_wsgi avoid the overhead by forking one process and handing requests to that one running process.

Let's assume the application is the worst kind of cgi.

The worst case is that it has files like this.

my_cgi.py

import cgi
print "status: 200 OK"
print "content-type: text/html"
print
print "<!doctype...>"
print "<html>"
etc.

You can try to "wrap" the original CGI files to make it wsgi.

wsgi.py

import cStringIO
def my_cgi( environ, start_response ):
    page = cStringIO.StringIO()
    sys.stdout= page
    os.environ.update( environ ) 
    # you may have to do something like execfile( "my_cgi.py", globals=environ ) 
    execfile( "my_cgi.py" )
    status = '200 OK' # HTTP Status
    headers = [('Content-type', 'text/html')] # HTTP Headers
    start_response(status, headers)
    return page.getvalue()

This a first step to rewriting your CGI application into a proper framework. This requires very little work, and will make your CGI's much more scalable, since you won't be starting a fresh CGI process for each request.

The second step is to create a mod_wsgi server that Apache uses instead of all the CGI scripts. This server must (1) parse the URL's, (2) call various function like the my_cgi example function. Each function will execfile the old CGI script without forking a new process.

Look at werkzeug for helpful libraries.

If your application CGI scripts have some structure (functions, classes, etc.) you can probably import those and do something much, much smarter than the above. A better way is this.

wsgi.py

from my_cgi import this_func, that_func
def my_cgi( environ, start_response ):

    result= this_func( some_args )
    page_text= that_func( result, some_other_args )

    status = '200 OK' # HTTP Status
    headers = [('Content-type', 'text/html')] # HTTP Headers
    start_response(status, headers)
    return page_text

This requires more work because you have to understand the legacy application. However, this has two advantages.

  1. It makes your CGI's more scalable because you're not starting a fresh process for each request.

  2. It allows you to rethink your application, possibly changing it to a proper framework. Once you've done this, it's not very hard to take the next step and move to TurboGears or Pylons or web.py for a very simple framework.

OTHER TIPS

Use FastCGI. If I understand FastCGI correctly, you can do what you want by writing a very simple Python program that sits between the web server and your legacy code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top