I've encountered some strange application behaviour while interacting with database using many processes. I'm using Linux.
I have my own implementation of QueryExecutor
which uses the a single connection during its lifetime:
class QueryExecutor(object):
def __init__(self, db_conf):
self._db_config = db_conf
self._conn = self._get_connection()
def execute_query(self, query):
# some code
# some more code
def query_executor():
global _QUERY_EXECUTOR
if _QUERY_EXECUTOR is None:
_QUERY_EXECUTOR = QueryExecutor(some_db_config)
return _QUERY_EXECUTOR
Query Executor
is never modified after instantiation.
Initially there is only one process, which from time to time forks (os.fork()
) several times. The new processes are workers which do some tasks and then exit. Each worker calls query_executor()
to be able to execute a SQL query.
I have found out that sql queries often return wrong results (it seems that sometimes sql query result is returned to the wrong process). The only sensible explanation is all processes share the same sql connection (according to MySQLdb doc: threadsafety = 1 Threads may share the module, but not connections).
I wonder which OS mechanism leads to this situation. As far as I know, on Linux when process forks, the parent process's pages are not copied for the child process, they are shared by both processes until one of them tries to modify some page (copy-on-write). As I have mentioned before, QueryExecutor
object remains unmodified after creation. I guess this is the reason for the fact that all processes uses the same QueryExecutor
instance and hence the same sql connection.
Am I right or do I miss something? Do you have any suggestions?
Thanks in advance!
Grzegorz