Question

I do know the difference between concurrency (CPU process swapping) and parallelism (processes running in real-time parallel on multiple cores). What I wish to know is what role threads and processes play in all of this. I am aware that every OS is different, and that CPU scheduling varies from OS to OS and VM to VM. In general, threads have much less overhead, and CPU swapping is generally quicker for threads, compared to processes. But when I read about multi-process computing, everyone seems to agree that this is the only alternative for computing on multiple CPUs in parallel.

Does this mean that threads are not capable of running in real parallel on multiple CPU cores -- or does it mean that multi-process computing is the only viable option if you need to run calculations on multiple physical CPU chips, such as cluster network supercomputers?

I would appreciate a clarification!

Was it helpful?

Solution

First, to clarify the terminology that you are using:

  • a process is an entity managed by an operating system, typically the execution of a program;
  • a thread is an entity within a process that executes instructions sequentially.

In this context, processes have an amount of state that is maintained by the operating system to record details of registers, memory, permissions etc. The size of this state is typically larger than that of a thread and therefore the overhead of managing processes (as you say) is greater. See Wikipedia for more details.

So, to answer your question, threads and processes (as defined above) can be executed in parallel on multiple processors, if the operating system or underlying architecture by which they are executed supports it.

Conventional parallel processors are shared-memory and a conventional operating system is Linux. Linux supports the parallel execution of both processes and threads on shared memory (symmetric) multicores, but it does not support the execution of processes (or threads) over multiple processors (that is, unless they are in a shared-memory configuration). There have been a number of distributed operating systems that are designed to support the execution of processes or threads over multiple processors without shared memory, but these never caught on; see Wikipedia.

Conventional cluster-based systems (such as supercomputers) employ parallel execution between processors using MPI. MPI is a communication interface between processes that execute in operating system instances on different processors; it doesn't support other process operations such as scheduling. (At the risk of complicating things further, because MPI processes are executed by operating systems, a single processor can run multiple MPI processes and/or a single MPI process can also execute multiple threads!)

Finally, a simple (although unconventional) example, where threads and processes have a slightly different meaning, is the XMOS processor architecture. This allows multiple processor chips to be connected together and for multiple threads of sequential execution to execute over and communicate between them, without an operating system.

OTHER TIPS

everyone seems to agree that this is the only alternative for computing on multiple CPUs in parallel.

I have never heard this. In any case it is not true.

Does this mean that threads are not capable of running in real parallel on multiple CPU cores

The thread is the unit of scheduling in most OS'es. Processes are not scheduling units. At the very most they come into play as inputs to scheduling heuristics. Threads run on CPUs (in parallel), not processes.

or does it mean that multi-process computing is the only viable option if you need to run calculations on multiple physical CPU chips, such as cluster network supercomputers?

No. Processes do not enhance the scheduling capabilities of the OS.

The question was not very precisely asked. I hope I could clarify the important points.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top