CSV export in Stream (from Django admin on Heroku)

https://stackoverflow.com/questions/13585850

02-12-2021
|

Question

we have the need to export a csv-file comprising data from the model from Django admin which runs on Heroku. Therefore we created an action where we created the csv and returned it in the response. This worked fine until our client started exporting huge sets of data and we run into the 30 second timeout of the Web worker.

To circumvent this problem we thought about streaming the csv to the client instead of building it first in memory and sending it in one piece. Trigger was this piece of information:

Cedar supports long-polling and streaming responses. Your app has an initial 30 second window to respond with a single byte back to the client. After each byte sent (either recieved from > the client or sent by your application) you reset a rolling 55 second window. If no data is > sent during the 55 second window your connection will be terminated.

We therefore implemented something that looks like this to test it:

import cStringIO as StringIO
import csv, time

def csv(request):
    csvfile = StringIO.StringIO()
    csvwriter = csv.writer(csvfile)

def read_and_flush():
    csvfile.seek(0)
    data = csvfile.read()
    csvfile.seek(0)
    csvfile.truncate()
    return data

def data():
    for i in xrange(100000):
        csvwriter.writerow([i,"a","b","c"])
        time.sleep(1)
        data = read_and_flush()
        yield data

response = HttpResponse(data(), mimetype="text/csv")
response["Content-Disposition"] = "attachment; filename=test.csv"
return response

The HTTP header of the download looks like this (from FireBug):

HTTP/1.1 200 OK
Cache-Control: max-age=0
Content-Disposition: attachment; filename=jobentity-job2.csv
Content-Type: text/csv
Date: Tue, 27 Nov 2012 13:56:42 GMT
Expires: Tue, 27 Nov 2012 13:56:41 GMT
Last-Modified: Tue, 27 Nov 2012 13:56:41 GMT
Server: gunicorn/0.14.6
Vary: Cookie
Transfer-Encoding: chunked
Connection: keep-alive

"Transfer-encoding: chunked" would indicate that Cedar is actually streaming the content chunkwise we guess.

Problem is that the download of the csv is still interrupted after 30 seconds with these lines in the Heroku log:

2012-11-27T13:00:24+00:00 app[web.1]: DEBUG: exporting tasks in csv-stream for job id: 56, 
2012-11-27T13:00:54+00:00 app[web.1]: 2012-11-27 13:00:54 [2] [CRITICAL] WORKER TIMEOUT (pid:5)
2012-11-27T13:00:54+00:00 heroku[router]: at=info method=POST path=/admin/jobentity/ host=myapp.herokuapp.com fwd= dyno=web.1 queue=0 wait=0ms connect=2ms service=29480ms status=200 bytes=51092
2012-11-27T13:00:54+00:00 app[web.1]: 2012-11-27 13:00:54 [2] [CRITICAL] WORKER TIMEOUT (pid:5)
2012-11-27T13:00:54+00:00 app[web.1]: 2012-11-27 13:00:54 [12] [INFO] Booting worker with pid: 12

This should work conceptually, right? Is there anything we missed?

We really appreciate your help. Tom

Solution

I found the solution to the problem. It's not a Heroku timeout because otherwise there would be a H12 timeout in the Heroku log (thanks to Caio of Heroku to point that out).

The problem was the default timeout of Gunicorn wich is 30 seconds. After adding --timeout 600 to the Procfile (at the line of Gunicorn) the problem was gone.

The Procfile now looks like this:

web: gunicorn myapp.wsgi -b 0.0.0.0:$PORT --timeout 600
celeryd: python manage.py celeryd -E -B --loglevel=INFO

OTHER TIPS

That's rather not the problem of your script, but the problem of 30 seconds web request default Heroku timeout. You could read this: https://devcenter.heroku.com/articles/request-timeout and according to this doc - move your CSV export to background process.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow