What is a multithreading performance impact for data intensive tasks on a massive mulitcore machine? [closed]

https://stackoverflow.com/questions/18972080

29-06-2022
|

Question

I'm reading a post about multi-threading performance issue on massive multicore machine: http://www.reddit.com/r/Python/comments/1mn12l/what_you_do_not_like_in_python/ccbc5h8 An author of that post claims that in a massive multicore systems multithreading applications has much bigger performance impact then multiprocessing ones.

AFIAK multithreading is cheaper then multiprocessing now (both in terms of system management and context switching). For simplicity let's assume that we don't need to use locks.

If we don't use locks to protect shared memory, are there any system limitations to manage multithreading applications and their access to resources?

Is there any non userspace implementation related reason, when multithreading has a huge performance impact (which the post author had)?
In other words: What is a system level property which cause data intensive multithread application to perform badly compared to similar multiprocess solution?

I'm aware about semantic difference between threads and processes.

Solution

Threads share a view of memory that processes do not. If you have a case where executors frequently need to make changes to the view of memory, a multi-threaded approach can be slower than a multi-process approach because of contention for the locks that internally protect the view of memory.

Threads also share file descriptors. If you have a case where files are frequently opened and closed, threads could wind up blocking each other for access to the process file descriptor table. A multi-process approach won't have this issue.

There can also be internal synchronization overhead in library functions. In a single-thread case, locks that protect process-level structures can be no-ops. In a multi-threaded case, these locks may require expensive atomic operations.

Lastly, multi-threaded processes may require frequent access to thread-local storage to implement things like errno. On some platforms, these accesses can be expensive and can be avoided in a single-threaded process.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow