Question

I stumbled on zeromq when searching for an efficient solution to IPC in python; I have a couple of python processes which need to do some cpu intensive processing on data from a dict in a master process. These worker processes only read from the dict, only the master process can alter the dict. The data in the dict will change, but atomically through the master process.

I'd ideally have a piece of shared memory where all worker processes could read the dict from, unfortunatly this doesn't seem to be possible in python.

Using a cache like redis or memcache sounds like overkill (dont want to use TCP & pickling to just share something I already have somewhere in memory in native format)..

So as an alternative I'd like to use zeromq to push relevant data from the master dict to subscribing workers using a zeromq IPC socket. This would mean that I'd (unfortunatly) have to serialize the relevant portion from the master dict (using msgpack?) and then push it using a zmq message. I read it is possible to do this using zero-copy so that I dont end up copying the data twice, is this something that automatically happens if I use the copy=False on my msgpacked binary string? And is this the way to go for my problem or do you guys have tips how to tackle this even more efficiently?

Thanks!

Martijn

Was it helpful?

Solution

Yes, if you send your msgpacked bytes with copy=False, there will be no extra copies of the data in-memory for the sending process (same goes for receiving side with copy=False).

Make sure to do performance tests, as the cost of the more complicated zero-copy machinery is often greater than the cost of the copy itself until messages start to get fairly large (crossover around 10s of kB per message).

An alternate approach is that you could just use the builtin multiprocessing module's facilities for shared data. It's not the most awesome, but for fairly simple things it can get the job done.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top