Python GIL: is django save() blocking?

https://stackoverflow.com/questions/10631419

09-06-2021
|

题

My django app saves django models to a remote database. Sometimes the saves are bursty. In order to free the main thread (*thread_A*) of the application from the time toll of saving multiple objects to the database, I thought of transferring the model objects to a separate thread (*thread_B*) using collections.deque and have *thread_B* save them sequentially.

Yet I'm unsure regarding this scheme. save() returns the id of the new database entry, so it "ends" only after the database responds, which is at the end of the transaction.

Does django.db.models.Model.save() really block GIL-wise and release other python threads during the transaction?

解决方案

Django's save() does nothing special to the GIL. In fact, there is hardly anything you can do with the GIL in Python code -- when it is executed, the thread must hold the GIL.

There are only two ways the GIL could get released in save():

Python decides to switch threads (after sys.getcheckinterval() instructions)
Django calls a database interface routine that is implemented to release the GIL

The second point could be what you are looking for -- a SQL COMMITis executed and during that execution, the SQL backend releases the GIL. However, this depends on the SQL interface, and I'm not sure if the popular ones actually release the GIL*.

Moreover, save() does a lot more than just running a few UPDATE/INSERT statements and a COMMIT; it does a lot of bookkeeping in Python, where it has to hold the GIL. In summary, I'm not sure that you will gain anything from moving save() to a different thread.

UPDATE: From looking at the sources, I learned that both the sqlite module and psycopg do release the GIL when they are calling database routines, and I guess that other interfaces do the same.

其他提示

Generally you should never have to worry about threads in a Django application. If you're serving your application with Apache, gunicorn or nearly any other server other than the development server, the server will spawn multiple processes and evade the GIL entirely. The exception is if you're using gunicorn with gevent, in which case there will be multiple processes but also microthreads inside those processes -- in that case concurrency helps a bit, but you don't have to manage the threads yourself to take advantage of that. The only case where you need to worry about the GIL is if you're trying to spawn multiple threads to handle a single request, which is not usually a good idea.

The Django save() method does not release the GIL itself, but the database backend will (in most cases the bulk of the time spent in save() will be doing database I/O). However, it's almost impossible to properly take advantage of this in a well-designed web application. Responses from your view should be fast even when done synchronously -- if they are doing too much work to be fast, then use a delayed job with Celery or another taskmaster to finish up the extra work. If you try to thread in your view, you'll have to finish up that thread before sending a response to the client, which in most cases won't help anything and will just add extra overhead.

I think python dont lock anything by itself, but database does.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow