Question

Let's consider the following simple scenario for an MPI application: a root process broadcasts (MPI_Bcast) some parameters (several tens of bytes - fixed size), then all nodes start performing some computations, then the root gathers the results (MPI_Gather - possibly very large data set). After the data is saved by the root, the program ends.

Under what circumstances (number of processes, latency, etc.) (if any) would using a communicator created using a virtual star topology provide better performance than using MPI_COMM_WORLD and why? Does a communicator use lazy initialization for the actual channels (i.e. only open a pipe, socket, etc., the first time it is required). Is this behavior implementation dependent?

Note: I'm using openmpi1.4.3-2 and plain C.

Was it helpful?

Solution

Communicator topologies are convenience mapping features, they are not required to change the way that actual communication happens. Even given that in a star or whatever other graph topology there are processes, that are not connected (in the topological sense), this does not prevent them from sending messages to each other, if they know the rank of the other process in the communicator. MPI implementations might use the topology as a hint in order to optimise the communication path, but this would make them very complex pieces of code and at least Open MPI doesn't do that in its collective algorithms (the not very well tested and hence usually disabled hierarch collective component takes hardware hierarchy into consideration, but not virtual topologies).

Topologies may influence communication via ranks reordering if one gives reorder=1 to the communicator constructor. This gives the MPI implementation the freedom to reorder the process ranks so as to match as close as possible their physical placement to the topology scheme supplied to the constructor, given the physical topology of the hardware underneath. There are hardware platforms with dedicated networks for collective operations. For example IBM Blue Gene/P has a global interrupt network that allows for fast implementation of MPI_BARRIER and a specialised collective network that speeds up some collective operations (including broadcasts). But these are only usable on MPI_COMM_WORLD - the fall-back software implementation is used for any other communicator.

Is this behavior implementation dependent?

Yes, it is implementation and system dependent (for implementations that support multiple hardware/communication systems). And this is also the answer to the rest of your question.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top