Question

I am developing a django webserver on which another machine (with a known IP) can upload a spreadsheet to my webserver. After the spreadsheet has been updated, I want to trigger some processing/validation/analysis on the spreadsheet (which can take >5 minutes --- too long for the other server to reasonably wait for a response) and then send the other machine (with a known IP) a HttpResponse indicating that the data processing is finished.

I realize that you can't do processing.data() after returning an HttpResponse, but functionally I want code that looks something like this:

# processing.py
def spreadsheet(*args, **kwargs):
    print "[robot voice] processing spreadsheet........."
    views.finished_processing_spreadsheet()

# views.py
def upload_spreadsheet(request):
    print "save the spreadsheet somewhere"
    return HttpResponse("started processing spreadsheet")
    processing.data()

def finished_processing_spreadsheet():
    print "send good news to other server (with known IP)"

I know how to write each function individually, but how can I effectively call processing.data() after views.upload_spreadsheet has returned a response?

I tried using django's request_finished signaling framework but this does not trigger the processing.spreadsheet() method after returning the HttpResponse. I tried using a decorator on views.upload_spreadsheet with the same problem.

I have an inkling that this might have something to do with writing middleware or possibly a custom class-based view, neither of which I have any experience with so I thought I would pose the question to the universe in search of some help.

Thanks for your help!

Was it helpful?

Solution

In fact Django have a syncronous model. If you want to do real async processing, you need a message queue. The most used with django is celery, it may look a bit "overkill" but it's a good answer.

Why do we need this? because in a wsgi app, apache give the request to the executable, and, the executable returns text. It's only once when the executable finish his execution that apache aknowledge the end of the request.

OTHER TIPS

The problem with your implementation is that if the number of spreadsheets in process is equal to the number of workers: your website will not respond anymore.

You should use a background task queue, basically have 2 processes: your server and a background task manager. The server should delegate the processing of the spreadsheet to the background task manager. When the background task is done, it should inform the server somehow. For example, it can do model_with_spreadsheet.processed = datetime.datetime.now().

You should use a background job manager like django-ztask (very easy setup), celery (very powerful, probably overkill in your case) or even uwsgi spooler (which obviously requires uwsgi deployment).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top