multiprocessing memory usage and twisted/gevents

Question 1

There was a great talk on Pycon which explains the subject of memory usage in python. It definitely a half an hour well spent.

The bottom line is that to really know how much memory is used you should not be looking at top output, but check how much memory you have free before and after running your 100 workers.

Question 2

CPython uses reference counting to implement memory management for all Python objects. The way this works is that each Python object is represented as a struct and each struct has a field in it giving the reference count. Whenever a new reference is made to the object, the reference count in that field is incremented. Whenever a reference to the object is given up, the reference count in that field is decremented. Once the reference count is zero the interpreter can be pretty sure the Python object is no longer needed and can free the memory allocated to the struct representing it.

Lots of things change the reference count of an object. Passing it to a function or assigning it to a (local or global) variable or an attribute of an object will increment the reference count (so will lots of other operations). The reverse of these decrements the reference count: for example, returning from a function decrements the reference count of all locals.

The reason all that is relevant to your question is that it should give you some idea of why the copy-on-write behavior you get out of fork() isn't going to help you save a whole lot of memory. Almost immediately, the CPython runtime is going to visit a large portion of the memory pages (the base unit of memory copy-on-write considers - often 4kB, perhaps larger) and replace lots of 2s with 3s or 4s with 3s or whatever. This will force much of the memory for the process to be copied.

An event-driven system will help with this by letting you do many I/O-bound tasks concurrently. You can still use multiple processes (at least with Twisted) to take advantage of the extra CPU resources you have at your disposal. A single, event-driven process can do all of the necessary networking and then hand off the resulting data to worker processes that get to use the rest of your CPUs. You can be more precise in what code you run in those extra processes, though. From your question, I suspect you think that your workers don't need everything that has been loaded into your "main" process. Using Twisted's process management APIs, they won't have to spend any memory on those things.