Slow multiprocessing with cython

https://stackoverflow.com/questions/23208297

07-07-2023
|

Domanda

I'm currently working on a minmax tree based ai in python. To squeeze extra performance out of the ai I've been using cython to optimize the bottlenecks, and have attempted to multiprocess the tree building.

The issue I have is that the ai is actually slower when multiprocessing with cython. I know there is overhead with multiprocessing, which can sometimes cause it to be slower. However, it's only slower when cython is used. When equivalent python code is used multiprocessing provides a 2-3 times performance increase.

I've run several tests to rule out any obvious problems. For example, I've run tests both with and without alpha-beta pruning enabled (which could under some circumstances perform better without multiprocessing), but it makes no difference. I've already setup the cython objects to be pickleable, and the multiprocessed cython ai builds a proper tree. The multiprocessing implementation I'm using (pass only the root children to a pool.map function) DOES increase performance, but only when pure python code is used.

Is there some quirk to cython that I'm missing? Some additional overhead to using cython code (or c extensions in general) with multiprocessing? Or is this a problem with cython itself?

Edit: Here are some example timings:

Given a depth of 7 and no Alpha-Beta pruning: (all times in seconds)

Cython, No Multiprocessing:
12.457

Cython, Multiprocessing:
15.440

No Cython, No Multiprocessing:
26.010

No Cython, Multiprocessing:
17.609

After much testing I've found the cause of the overhead. @Veedrac is right in that there is extra overhead with c extensions, and the slowness of python masked the overhead without cython. Specifically, the overhead occurred when returning branches from the multiple processors, and adding them to the root node. This explains why the overhead was not constant, and actually scaled up as the depth of the tree increased.

I had actually suspected this, and tested for it before. However, it appears the code I previously used to test for this overhead was bugged. I've now fixed the multiprocessing to only return necessary information, and the overhead has been eliminated. The Cython with multiprocessing now runs very quickly.

Soluzione

Cython can have translation costs if you go between C and Python types too much, which could contribute. There's also the fact that the speedup in Python will be higher, which hides overhead.

One suggestion is to use nogil functions and see whether threading has a lower overhead.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow