Pregunta

Thank you for reading my post. I just started using openMPI. I installed openmpi 1.6.5 on my mac (OSX 10.5.8) and on my linux (mint 14). Both computers can compile and run very simple programs such as Hello World or sending integers from one process to another. However whenever I attempt to send an array using MPI_Bcast() or MPI_send() it throws a segmentation fault error.

#include <iostream>
#include <stdlib.h>
#include <mpi.h>
using namespace std;

int main(int argc,char** argv)
{
    int np,nid;
    float *a;

    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&np);
    MPI_Comm_rank(MPI_COMM_WORLD,&nid); 

    if (nid == 0)
    {
        a = (float*) calloc(9,sizeof(float));
        for (int i = 0; i < 9; i++)
        {
            a[i] = i;
        }
    }

    MPI_Bcast(a,9,MPI_FLOAT,0,MPI_COMM_WORLD);  

    MPI_Finalize();
    return 0;
}    

Here is the error message:

[rsove-M11BB:02854] *** Process received signal ***
[rsove-M11BB:02854] Signal: Segmentation fault (11)
[rsove-M11BB:02854] Signal code: Address not mapped (1)
[rsove-M11BB:02854] Failing at address: (nil)
[rsove-M11BB:02855] *** Process received signal ***
[rsove-M11BB:02855] Signal: Segmentation fault (11)
[rsove-M11BB:02855] Signal code: Address not mapped (1)
[rsove-M11BB:02855] Failing at address: (nil)
[rsove-M11BB:02854] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fddf08f64a0]
[rsove-M11BB:02854] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x142953) [0x7fddf0a02953]
[rsove-M11BB:02854] [ 2] /usr/local/openmpi    /lib/libmpi.so.1(opal_convertor_unpack+0x105) [0x7fddf12a0b35]
[rsove-M11BB:02854] [ 3] /usr/local/openmpi/lib/openmpi    /mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x415) [0x7fddece38ee5]
[rsove-M11BB:02854] [ 4] /usr/local/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x23d) [0x7fddec61477d]
[rsove-M11BB:02854] [ 5] /usr/local/openmpi/lib/libmpi.so.1(opal_progress+0x5a)     [0x7fddf12ac2ea]
[rsove-M11BB:02854] [ 6] /usr/local/openmpi/lib/libmpi.so.1(ompi_request_default_wait+0x11d) [0x7fddf11fce2d]
[rsove-M11BB:02854] [ 7] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_generic+0x4d6) [0x7fddeb73e346]
[rsove-M11BB:02854] [ 8] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_binomial+0xcb) [0x7fddeb73e85b]
[rsove-M11BB:02854] [ 9] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0xcc) [0x7fddeb735b5c]
[rsove-M11BB:02854] [10] /usr/local/openmpi/lib/openmpi/mca_coll_sync.so(mca_coll_sync_bcast+0x79) [0x7fddeb951799]
[rsove-M11BB:02854] [11] /usr/local/openmpi/lib/libmpi.so.1(MPI_Bcast+0x148) [0x7fddf12094d8]
[rsove-M11BB:02854] [12] Test(main+0xb4) [0x408f90]
[rsove-M11BB:02854] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fddf08e176d]
[rsove-M11BB:02854] [14] Test() [0x408df9]
[rsove-M11BB:02854] *** End of error message ***
[rsove-M11BB:02855] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fa4c67be4a0]
[rsove-M11BB:02855] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x142953) [0x7fa4c68ca953]
[rsove-M11BB:02855] [ 2] /usr/local/openmpi/lib/libmpi.so.1(opal_convertor_unpack+0x105) [0x7fa4c7168b35]
[rsove-M11BB:02855] [ 3] /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x415) [0x7fa4c2d00ee5]
[rsove-M11BB:02855] [ 4] /usr/local/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x23d) [0x7fa4c24dc77d]
[rsove-M11BB:02855] [ 5] /usr/local/openmpi/lib/libmpi.so.1(opal_progress+0x5a) [0x7fa4c71742ea]
[rsove-M11BB:02855] [ 6] /usr/local/openmpi/lib/libmpi.so.1(ompi_request_default_wait+0x11d) [0x7fa4c70c4e2d]
[rsove-M11BB:02855] [ 7] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_generic+0x59c) [0x7fa4c160640c]
[rsove-M11BB:02855] [ 8] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_binomial+0xcb) [0x7fa4c160685b]
[rsove-M11BB:02855] [ 9] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0xcc) [0x7fa4c15fdb5c]
[rsove-M11BB:02855] [10] /usr/local/openmpi/lib/openmpi/mca_coll_sync.so(mca_coll_sync_bcast+0x79) [0x7fa4c1819799]
[rsove-M11BB:02855] [11] /usr/local/openmpi/lib/libmpi.so.1(MPI_Bcast+0x148) [0x7fa4c70d14d8]
[rsove-M11BB:02855] [12] Test(main+0xb4) [0x408f90]
[rsove-M11BB:02855] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fa4c67a976d]
[rsove-M11BB:02855] [14] Test() [0x408df9]
[rsove-M11BB:02855] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 2854 on node rsove-M11BB exited on signal     11 (Segmentation fault).
--------------------------------------------------------------------------

The strange thing is that when I run the same code on my friends computer it compiles and runs without problem.

Thanks in advance for your help.

¿Fue útil?

Solución

You are making a very typical mistake. The MPI_Bcast operation requires that an already allocated array is passed as its first argument at both the root and at all other ranks. Therefore the code has to be modified, e.g. like this:

// Allocate the array everywhere
a = (float*) calloc(9,sizeof(float));
// Initialise the array at rank 0 only
if (nid == 0)
{
    for (int i = 0; i < 9; i++)
    {
        a[i] = i;
    }
}
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top