Distributed vs parallel computing

https://cs.stackexchange.com/questions/1580

16-10-2019
|

Question

I often hear people talking about parallel computing and distributed computing, but I'm under the impression that there is no clear boundary between the 2, and people tend to confuse that pretty easily, while I believe it is very different:

Parallel computing is more tightly coupled to multi-threading, or how to make full use of a single CPU.
Distributed computing refers to the notion of divide and conquer, executing sub-tasks on different machines and then merging the results.

However, since we stepped into the Big Data era, it seems the distinction is indeed melting, and most systems today use a combination of parallel and distributed computing.

An example I use in my day-to-day job is Hadoop with the Map/Reduce paradigm, a clearly distributed system with workers executing tasks on different machines, but also taking full advantage of each machine with some parallel computing.

I would like to get some advice to understand how exactly to make the distinction in today's world, and if we can still talk about parallel computing or there is no longer a clear distinction. To me it seems distributed computing has grown a lot over the past years, while parallel computing seems to stagnate, which could probably explain why I hear much more talking about distributing computations than parallelizing.

Solution

This is partly a matter of terminology, and as such, only requires that you and the person you're talking to clarify it beforehand. However, there are different topics that are more strongly associated with parallelism, concurrency, or distributed systems.

Parallelism is generally concerned with accomplishing a particular computation as fast as possible, exploiting multiple processors. The scale of the processors may range from multiple arithmetical units inside a single processor, to multiple processors sharing memory, to distributing the computation on many computers. On the side of models of computation, parallelism is generally about using multiple simultaneous threads of computation internally, in order to compute a final result. Parallelism is also sometimes used for real-time reactive systems, which contain many processors that share a single master clock; such systems are fully deterministic.

Concurrency is the study of computations with multiple threads of computation. Concurrency tends to come from the architecture of the software rather than from the architecture of the hardware. Software may be written to use concurrency in order to exploit hardware parallelism, but often the need is inherent in the software's behavior, to react to different asynchronous events (e.g. a computation thread that works independently of a user interface thread, or a program that reacts to hardware interrupts by switching to an interrupt handler thread).

Distributed computing studies separate processors connected by communication links. Whereas parallel processing models often (but not always) assume shared memory, distributed systems rely fundamentally on message passing. Distributed systems are inherently concurrent. Like concurrency, distribution is often part of the goal, not solely part of the solution: if resources are in geographically distinct locations, the system is inherently distributed. Systems in which partial failures (of processor nodes or of communication links) are possible fall under this domain.

OTHER TIPS

As pointed out by @Raphael, Distributed Computing is a subset of Parallel Computing; in turn, Parallel Computing is a subset of Concurrent Computing.

Concurrency refers to the sharing of resources in the same time frame. For instance, several processes share the same CPU (or CPU cores) or share memory or an I/O device. Operating systems manage shared resources. Multiprocessor machines and distributed systems are architectures in which concurrency control plays an important role. Concurrency occurs at both the hardware and software level. Multiple devices operate at the same time, processors have internal parallelism and work on several instructions simultaneously, systems have multiple processors, and systems interact through network communication. Concurrency occurs at the applications level in signal handling, in the overlap of I/O and processing, in communication, and in the sharing of resources between processes or among threads in the same process.

Two processes (or threads) executing on the same system so that their execution is interleaved in time are concurrent: processes (threads) are sharing the CPU resource. I like the following definition: two processes (threads) executing on the same system are concurrent if and only if the second process (thread) begins execution when the first process (thread) has not yet terminated its execution.

Concurrency becomes parallelism when processes (or threads) execute on different CPUs (or cores of the same CPU). Parallelism in this case is not “virtual” but “real”.

When those CPUs belong to the same machine, we refer to the computation as "parallel"; when the CPUs belong to different machines, may be geographically spread, we refer to the computation as "distributed".

Therefore, Distributed Computing is a subset of Parallel Computing, which is a subset of Concurrent Computing.

Of course, it is true that, in general, parallel and distributed computing are regarded as different. Parallel computing is related to tightly-coupled applications, and is used to achieve one of the following goals:

Solve compute-intensive problems faster;
Solve larger problems in the same amount of time;
Solve same size problems with higher accuracy in the same amount of time.

In the past, the first goal was the main reason for parallel computing: accelerating the solution of problem. Right now, and when possible, scientists mainly use parallel computing to achieve either the second goal (e.g., they are willing to spend the same amount of time $T$ they spent in the past solving in parallel a problem of size $x$ to solve now a problem of size $5x$) or the third one (i.e., they are willing to spend the same amount of time $T$ they spent in the past solving in parallel a problem of size $x$ to solve now a problem of size $x$ but with higher accuracy using a much more complex model, more equations, variables and constraints). Parallel computing may use shared-memory, message-passing or both (e.g., shared-memory intra-node using OpenMP, message-passing inter-node using MPI); it may use GPUs accelerators as well. Since the application runs on one parallel supercomputer, we usually do not take into account issues such as failures, network partition etc, since the probability of these events is, for practical purposes, close to zero. However, large parallel applications such as climate change simulations, which may run for several months, are usually concerned with failures, and use checkpointing/restart mechanism to avoid starting the simulation again from the beginning if a problem arise.

Distributed computing is related to loosely-coupled applications, in which the goal (for distributed supercomputing) is to solve problems otherwise too large or whose execution may be divided on different components that could benefit from execution on different architectures. There are several models including client-server, peer-to-peer etc. The issues arising in distributed computing, such as security, failures, network partition etc must be taken into account at design time, since in this context failures are the rule and not the exception.

Finally, Grid and Cloud computing are both subset of distributed computing. The grid computing paradigm emerged as a new field distinguished from traditional distributed computing because of its focus on large-scale resource sharing and innovative high-performance applications. Resources being shared, usually belong to multiple, different administrative domains (so-called Virtual Organizations). Grid Computing, while being heavily used by scientists in the last decade, is traditionally difficult for ordinary users. Cloud computing tries to bridge the gap, by allowing ordinary users to exploit easily multiple machines, which are co-located in the same data center and not geographically distributed, through the use of Virtual Machines that can be assembled by the users to run their applications. Owing to the hardware, in particular the usual lack of an high-performance network interconnect (such as Infiniband etc), clouds are not targeted for running parallel MPI applications. Distributed applications running on clouds are usually implemented to exploit the Map/Reduce paradigm. By the way, many people think of Map/reduce as a parallel data flow model.

I'm not sure I understand the question. The distinction between parallel and distributed processing is still there. The fact that you can take advantage of both in the same computation doesn't change what the concepts mean.

And I don't know what news are you following, but I'm quite sure parallel processing is not stagnating, especially since I think it's useful much more often.

If you need to process terabytes of data, distributed computing (possibly combined with parallel computing) is the way to go. But if you need to compute something on a desktop or smartphone, parallel computing alone will probably give you best results, considering that internet connection might not be available always and when it is, it can be slow.

Here is a recent paper that is worth reading:

Michel Raynal: "Parallel Computing vs. Distributed Computing: A Great Confusion?", Proc. Euro-Par 2015, doi:10.1007/978-3-319-27308-2_4

Abstract:

This short position paper discusses the fact that, from a teaching point of view, parallelism and distributed computing are often confused, while, when looking at their deep nature, they address distinct fundamental issues. Hence, appropriate curricula should be separately designed for each of them. The “everything is in everything (and reciprocally)” attitude does not seem to be a relevant approach to teach students the important concepts which characterize parallelism on the one side, and distributed computing on the other side.

In the Introduction section of the book [1], the authors provide another perspective (different from the ones in other answers) on the comparison between distributed computing and parallel computing.

In broad terms, the goal of parallel processing is to employ all processors to perform one large task. In contrast, each processor in a distributed system generally has its own semi-independent agenda, but for various reasons, including sharing of resources, availability, and fault tolerance, processors need to coordinate their actions.

From this perspective, the Map/Reduce paradigm mainly falls into the parallel computing context. However, if we want the nodes involved to reach a consensus on a common leader, by using, for example, the Paxos (wiki) algorithm, then we are considering a typical problem in distributed computing.

[1] Distributed Computing. Fundamentals, Simulations, and Advanced Topics. Hagit Attiya and Jennifer Welch. 2004.

There is the answer which is more appropriate here. Basically, parallel refers to memory-shared multiprocessor whereas distributed refers to its private-memory multicomputers. That is, the first one is a single multicore or superscalar machine whereas another is a geographically distributed network of computers. The latter implies less coupling and, thus, more availability and fault tolerance at the cost of less performance. The performance is suffered because you need data (de-)serialization in every round-trip and its delivery over longer distances whereas you can simply refer an in-memory object to pass it to another CPU in parallel processor.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange