Question

I have a problem when trying tu use slurm SBATCH jobs or SRUN jobs with MPI over infiniband.

OpenMPI is installed, and if I launch the following test program (called hello) with mpirun -n 30 ./hello it works.

// compilation: mpicc -o helloMPI helloMPI.c
#include <mpi.h>
#include <stdio.h>
int main ( int argc, char * argv [] )
{
   int myrank, nproc;
   MPI_Init ( &argc, &argv );
   MPI_Comm_size ( MPI_COMM_WORLD, &nproc );
   MPI_Comm_rank ( MPI_COMM_WORLD, &myrank );
  printf ( "hello from rank %d of %d\n", myrank, nproc );
   MPI_Barrier ( MPI_COMM_WORLD );
   MPI_Finalize (); 
   return 0;
}

so :

user@master:~/hello$ mpicc -o hello hello.c
user@master:~/hello$ mpirun -n 30 ./hello
--------------------------------------------------------------------------
[[5627,1],2]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: usNIC
  Host: master

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
hello from rank 25 of 30
hello from rank 1 of 30
hello from rank 6 of 30
[...]
hello from rank 17 of 30

When I try to launch it through SLURM I get segmentation faults like this:

user@master:~/hello$ srun -n 20 ./hello
[node05:01937] *** Process received signal ***
[node05:01937] Signal: Segmentation fault (11)
[node05:01937] Signal code: Address not mapped (1)
[node05:01937] Failing at address: 0x30
[node05:01937] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fcf6bf7ecb0]
[node05:01937] [ 1] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_btl_openib.so(+0x244c6)[0x7fcf679b64c6]
[node05:01937] [ 2] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_btl_openib.so(+0x254cb)[0x7fcf679b74cb]
[node05:01937] [ 3] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_btl_openib.so(ompi_btl_openib_connect_base_select_for_local_port+0xb1)[0x7fcf679b2141]
[node05:01937] [ 4] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_btl_openib.so(+0x10ad0)[0x7fcf679a2ad0]
[node05:01937] [ 5] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(mca_btl_base_select+0x114)[0x7fcf6c209b34]
[node05:01937] [ 6] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7fcf67bca652]
[node05:01937] [ 7] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(mca_bml_base_init+0x69)[0x7fcf6c209359]
[node05:01937] [ 8] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_pml_ob1.so(+0x5975)[0x7fcf65d1b975]
[node05:01937] [ 9] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(mca_pml_base_select+0x35c)[0x7fcf6c21a0bc]
[node05:01937] [10] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(ompi_mpi_init+0x4ed)[0x7fcf6c1cb89d]
[node05:01937] [11] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(MPI_Init+0x16b)[0x7fcf6c1eb56b]
[node05:01937] [12] /home/user/hello/./hello[0x400826]
[node05:01937] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fcf6bbd076d]
[node05:01937] [14] /home/user/hello/./hello[0x400749]
[node05:01937] *** End of error message ***
[node05:01938] *** Process received signal ***
[node05:01938] Signal: Segmentation fault (11)
[node05:01938] Signal code: Address not mapped (1)
[node05:01938] Failing at address: 0x30
[node05:01940] *** Process received signal ***
[node05:01940] Signal: Segmentation fault (11)
[node05:01940] Signal code: Address not mapped (1)
[node05:01940] Failing at address: 0x30
[node05:01938] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f68b2e10cb0]
[node05:01938] [ 1] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_btl_openib.so(+0x244c6)[0x7f68ae8484c6]
[node05:01938] [ 2] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_btl_openib.so(+0x254cb)[0x7f68ae8494cb]
[node05:01940] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f8af1d82cb0]
[node05:01940] [ 1] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_btl_openib.so(+0x244c6)[0x7f8aed7ba4c6]
[node05:01940] [ 2] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_btl_openib.so(+0x254cb)[0x7f8aed7bb4cb]
[node05:01940] [ 3] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_btl_openib.so(ompi_btl_openib_connect_base_select_for_local_port+0xb1)[0x7f8aed7b6141]
[node05:01940] [ 4] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_btl_openib.so(+0x10ad0)[0x7f8aed7a6ad0]
[node05:01938] [ 3] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_btl_openib.so(ompi_btl_openib_connect_base_select_for_local_port+0xb1)[0x7f68ae844141]
[node05:01938] [ 4] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_btl_openib.so(+0x10ad0)[0x7f68ae834ad0]
[node05:01938] [ 5] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(mca_btl_base_select+0x114)[0x7f68b309bb34]
[node05:01938] [ 6] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7f68aea5c652]
[node05:01940] [ 5] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(mca_btl_base_select+0x114)[0x7f8af200db34]
[node05:01940] [ 6] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7f8aed9ce652]
[node05:01938] [ 7] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(mca_bml_base_init+0x69)[0x7f68b309b359]
[node05:01938] [ 8] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_pml_ob1.so(+0x5975)[0x7f68acbad975]
[node05:01940] [ 7] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(mca_bml_base_init+0x69)[0x7f8af200d359]
[node05:01940] [ 8] /opt/cluster/spool/openMPI/1.8/gcc/lib/openmpi/mca_pml_ob1.so(+0x5975)[0x7f8aebb1f975]
[node05:01940] [ 9] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(mca_pml_base_select+0x35c)[0x7f8af201e0bc]
[node05:01938] [ 9] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(mca_pml_base_select+0x35c)[0x7f68b30ac0bc]
[node05:01938] [10] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(ompi_mpi_init+0x4ed)[0x7f68b305d89d]
[node05:01940] [10] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(ompi_mpi_init+0x4ed)[0x7f8af1fcf89d]
[node05:01938] [11] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(MPI_Init+0x16b)[0x7f68b307d56b]
[node05:01938] [12] /home/user/hello/./hello[0x400826]
[node05:01940] [11] /opt/cluster/spool/openMPI/1.8/gcc/lib/libmpi.so.1(MPI_Init+0x16b)[0x7f8af1fef56b]
[node05:01940] [12] /home/user/hello/./hello[0x400826]
[node05:01938] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f68b2a6276d]
[node05:01938] [14] /home/user/hello/./hello[0x400749]
[node05:01938] *** End of error message ***
[node05:01940] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f8af19d476d]
[node05:01940] [14] /home/user/hello/./hello[0x400749]
[node05:01940] *** End of error message ***
[node05:01939] *** Process received signal ***
[node05:01939] Signal: Segmentation fault (11)
[node05:01939] Signal code: Address not mapped (1)
[node05:01939] Failing at address: 0x30
[...]etc

Does anyone know what is the problem? I have built openMPI with Slurm support, and installed the same version of compilers and libs, in fact all the libs are in a NFS share which is mounted on each node.

remarks:

It should use infiniband, as it is installed. But when I lauch openmpi with mpirun I notice the

[[5627,1],2]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: usNIC
  Host: cluster

which I guess means "not running through infiniband". I have installed infiniband drivers, and set up IP over Infiniband. Slurm is configured to run with infiniband IPs : is it a right configuration?

Thanks in advance Best regards

EDIT :

I have just tried to compile it with MPICH2 instead of openMPI and it work with SLURM. So the problem is probably related to openMPI and not Slurm configuration?

EDIT 2: Actually, I have seen that using openMPI 1.6.5 (instead of 1.8) with SBATCH command instead of SRUN my script is executed (i.e. it returns the thread number, rank and host. But it shows warnings related to the openfabric vendor and allocation of registered memory:

The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory.  This typically can indicate that the
memlock limits are set too low.  For most HPC installations, the
memlock limits should be set to "unlimited".  The failure occured
here:

  Local host:    node05
  OMPI source:   btl_openib_component.c:1216
  Function:      ompi_free_list_init_ex_new()
  Device:        mlx4_0
  Memlock limit: 65536

You may need to consult with your system administrator to get this
problem fixed.  This FAQ entry on the Open MPI web site may also be
helpful:

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   node05
  Local device: mlx4_0
--------------------------------------------------------------------------
Hello world from process 025 out of 048, processor name node06
Hello world from process 030 out of 048, processor name node06
Hello world from process 032 out of 048, processor name node06
Hello world from process 046 out of 048, processor name node07
Hello world from process 031 out of 048, processor name node06
Hello world from process 041 out of 048, processor name node07
Hello world from process 034 out of 048, processor name node06
Hello world from process 044 out of 048, processor name node07
Hello world from process 033 out of 048, processor name node06
Hello world from process 045 out of 048, processor name node07
Hello world from process 026 out of 048, processor name node06
Hello world from process 043 out of 048, processor name node07
Hello world from process 024 out of 048, processor name node06
Hello world from process 038 out of 048, processor name node07
Hello world from process 014 out of 048, processor name node05
Hello world from process 027 out of 048, processor name node06
Hello world from process 036 out of 048, processor name node07
Hello world from process 019 out of 048, processor name node05
Hello world from process 028 out of 048, processor name node06
Hello world from process 040 out of 048, processor name node07
Hello world from process 023 out of 048, processor name node05
Hello world from process 042 out of 048, processor name node07
Hello world from process 018 out of 048, processor name node05
Hello world from process 039 out of 048, processor name node07
Hello world from process 021 out of 048, processor name node05
Hello world from process 047 out of 048, processor name node07
Hello world from process 037 out of 048, processor name node07
Hello world from process 015 out of 048, processor name node05
Hello world from process 035 out of 048, processor name node06
Hello world from process 020 out of 048, processor name node05
Hello world from process 029 out of 048, processor name node06
Hello world from process 016 out of 048, processor name node05
Hello world from process 017 out of 048, processor name node05
Hello world from process 022 out of 048, processor name node05
Hello world from process 012 out of 048, processor name node05
Hello world from process 013 out of 048, processor name node05
Hello world from process 000 out of 048, processor name node04
Hello world from process 001 out of 048, processor name node04
Hello world from process 002 out of 048, processor name node04
Hello world from process 003 out of 048, processor name node04
Hello world from process 006 out of 048, processor name node04
Hello world from process 009 out of 048, processor name node04
Hello world from process 011 out of 048, processor name node04
Hello world from process 004 out of 048, processor name node04
Hello world from process 007 out of 048, processor name node04
Hello world from process 008 out of 048, processor name node04
Hello world from process 010 out of 048, processor name node04
Hello world from process 005 out of 048, processor name node04
[node04:04390] 47 more processes have sent help message help-mpi-btl-openib.txt / init-fail-no-mem
[node04:04390] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[node04:04390] 47 more processes have sent help message help-mpi-btl-openib.txt / error in device init

What I understand from that, is that a) v.1.6.5 has a better error handling and b) I have to configure openMPI and/or infiniband drivers with more registered memory size. I see this page and apparently I only need to modify openMPI stuff? I have to test it...

Was it helpful?

Solution

Two things: to "srun ... mpi_app", you need to do special things in OMPI. See http://www.open-mpi.org/faq/?category=slurm for how to run Open MPI jobs under SLURM.

The usnic message seems like a legitimate bug report that you should submit to the Open MPI user's mailing list:

http://www.open-mpi.org/community/lists/ompi.php

In particular, I would like to see some details in order to figure out why you're getting the warning message about usNIC (I'm guessing you're not running on a Cisco UCS platform with usNIC installed, but if you have IB installed, you shouldn't see this message).

OTHER TIPS

  1. My solution: upgrade to Slurm 14.03.2-1, OpenMPI 1.8.1.

  2. Bizarrely, I ran into exactly this problem on some of my nodes (segfault on btl openib) after an Infiniband network reorganisation. I was using Slurm 2.6.9 and OpenMPI 1.8.

On the racks with Dell/AMD Opteron/Mellanox it would segfault (and it was working before a the network reorganisation.)

Racks with HP/Intel/Mellanox continue to work pre- and post- reorg.

This may have something to do with the Infiniband topology.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top