Number of tcp connections used by MPI program (MPICH2+nemesis+tcp)

https://stackoverflow.com/questions/8359682

27-10-2019
|

Question

How much tcp connections will be used for sending data by MPI program if the MPI used is MPICH2? If you know also about pmi connections, count them separately.

For example, if I have 4 processes and additional 2 Communicators (COMM1 for 1st and 2nd processes and COMM2 for 3rd and 4rd); the data is sent between each possible pair of processes; in every possible communicator.

I use recent MPICH2 + hydra + default pmi. OS is linux, network is switched Ethernet. Every process in on separated PC.

So, here are pathes of data (in pairs of processes):

1 <-> 2 (in MPI_COMM_WORLD and COMM1)
1 <-> 3 (only in MPI_COMM_WORLD)
1 <-> 4 (only in MPI_COMM_WORLD)
2 <-> 3 (only in MPI_COMM_WORLD)
2 <-> 4 (only in MPI_COMM_WORLD)
3 <-> 4 (in MPI_COMM_WORLD and COMM2)

I think there can be

Case 1:

Only 6 tcp connections will be used; data sent in COMM1 and MPI_COMM_WORLD will be mixed in the single tcp connection.

Case 2:

8 tcp connections: 6 in MPI_COMM_WORLD (all-to-all = full mesh) + 1 for 1 <-> 2 in COMM1 + 1 for 3 <-> 4 in COMM2

other variant that I didn't think about.

Solution

Which communicators are being used doesn't affect the number of TCP connections that are established. For --with-device=ch3:nemesis:tcp (the default configuration), you will use one bidirectional TCP connection between each pair of processes that directly communicate via point-to-point MPI routines. In your example, this means 6 connections. If you use collectives then under the hood additional connections may be established. Connections will be established lazily, only as needed, but once established they will stay established until MPI_Finalize (and sometimes also MPI_Comm_disconnect) is called.

Off the top of my head I don't know how many connections are used by each process for PMI, although I'm fairly sure it should be one per MPI process connecting to the hydra_pmi_proxy processes, plus some other number (probably logarithmic) of connections among the hydra_pmi_proxy and mpiexec processes.

OTHER TIPS

I can't answer your question completely, but here's something to consider. In MVAPICH2 for the PMI we developed a tree based connection mechanism. So each node would have log (n) TCP connections at the max. Since opening a socket would subject you to the open file descriptor limit on most OSes, its probable that the MPI library would use a logical topology over the ranks to limit the number of TCP connections.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow