The peculiarities of using TCP/IP with Open MPI are described in the FAQ. I'll try to give an executive summary here.
Open MPI uses a greedy approach when it comes to utilising network interfaces for data exchange. In particular, the TCP/IP BTL (Byte Transfer Layer) and OOB (Out-Of-Band) components tcp
will try to use all configured network interfaces with matching address families. In your case each node has many interfaces with addresses from the IPv4 address family:
comp01-mpi comp02-mpi
----------------------------------------------------------
eth0 192.168.0.101/24 eth0 192.168.0.102/24
eth0.2002 10.111.2.36/24 eth0.2002 10.111.2.37/24
eth1 192.168.1.101/24 eth1 192.168.1.102/24
eth2 10.111.1.36/23 eth2 10.111.1.37/23
lo 127.0.0.1/8 lo 127.0.0.1/8
Open MPI assumes that each interface on comp02-mpi
is reachable from any interface on comp01-mpi
and vice versa. This is never the case with the loopback interface lo
, therefore by default Open MPI excludes lo
. Network sockets are then opened lazily (e.g. on demand) when information has to be transported.
What happens in your case is that when transporting messages, Open MPI chops them down into fragments and then tries to send the different segments over different connections in order to maximise the bandwidth. By default the fragments are of size 128 KiB, which only holds 32768 int
elements, also the very first (eager) fragment is of size 64 KiB and holds twice as less elements. It might happen that the assumption that each interface on comp01-mpi
is reachable from each interface on comp02-mpi
(and vice versa) is wrong, e.g. if some of them are connected to separate isolated networks. In that case the library will be stuck in trying to make a connection that can never happen and the program will hang. This should usually happen for messages of more than 16384 int
elements.
To prevent the above mentioned situation, one can restrict the interfaces or networks that Open MPI uses for TCP/IP communication. The btl_tcp_if_include
MCA parameter can be used to provide the library with the list of interfaces that it should use. The btl_tcp_if_exclude
can be used to instruct the library which interfaces to exclude. That one is set to lo
by default and if one would like to exclude specific interface(s), then one should explicitly add lo
to the list.
Everything from above also applies to the out-of-band communication used to transport special information. The parameters for selecting or deselecting interfaces for OOB are oob_tcp_if_include
and oob_tcp_if_exclude
conversely. Those are usually set together with the BTL parameters. Therefore you should try setting those to combinations that actually work. Start by narrowing the selection down the a single interface:
mpiexec --mca btl_tcp_if_include eth0 --mca oob_tcp_if_include eth0 ...
If it doesn't work with eth0
, try other interfaces.
The presence of the virtual interface eth0.2002
is going to further confuse Open MPI 1.6.2 and newer.