Gathering and organazing vectors using MPI

Question

It depends a little bit on what you mean by "in order". If you mean that, as in the above example, each vector is made up of blocks of data and you want those blocks interleaved in a fixed known order, yes, you can certainly do this. (The question could also be read to be asking if you can do a sort as part of the gather; that's rather harder.)

You have the right approach; you want to send the data as is, but receive the data into specified chunks broken up by processor. Here, the data type you want to receive into looks like this:

MPI_Datatype vectype;
MPI_Type_vector(NBLOCKS, BLOCKSIZE, size*BLOCKSIZE, MPI_CHAR, &vectype);

That is, for a given processor's input, you're going to receive it into NBLOCKS blocks of size BLOCKSIZE, each separated by however many processors there are times the blocksize. As it is, you could receive into that type; to gather into that type, however, you need to set the extents so that the data from each processor is gathered into the right place:

MPI_Datatype gathertype;
MPI_Type_create_resized(vectype, 0, BLOCKSIZE*sizeof(char), &gathertype);
MPI_Type_commit(&gathertype);

The reason for that resizing is given in, for instance, this answer, and likely elsewhere on this site as well.

Putting this together into sample code gives us the following:

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char **argv) {

    int rank, size;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    const int BLOCKSIZE=2;      /* each block of data is 2 items */
    const int NBLOCKS  =3;      /* each task has 3 such blocks */

    char locdata[NBLOCKS*BLOCKSIZE];
    for (int i=0; i<NBLOCKS*BLOCKSIZE; i++)
        locdata[i] = 'A' + (char)rank;  /* rank 0 = 'AAA..A'; rank 1 = 'BBB..B', etc */

    MPI_Datatype vectype, gathertype;
    MPI_Type_vector(NBLOCKS, BLOCKSIZE, size*BLOCKSIZE, MPI_CHAR, &vectype);
    MPI_Type_create_resized(vectype, 0, BLOCKSIZE*sizeof(char), &gathertype);
    MPI_Type_commit(&gathertype);

    char *globaldata = NULL;
    if (rank == 0) globaldata = malloc((NBLOCKS*BLOCKSIZE*size+1)*sizeof(char));

    MPI_Gather(locdata, BLOCKSIZE*NBLOCKS, MPI_CHAR,
               globaldata, 1, gathertype,
               0, MPI_COMM_WORLD);

    if (rank == 0) {
        globaldata[NBLOCKS*BLOCKSIZE*size] = '\0';
        printf("Assembled data:\n");
        printf("<%s>\n", globaldata);
        free(globaldata);
    }

    MPI_Type_free(&gathertype);
    MPI_Finalize();

    return 0;
}

Running gives:

$ mpirun -np 3 ./vector
Assembled data:
<AABBCCAABBCCAABBCC>
$ mpirun -np 7 ./vector
Assembled data:
<AABBCCDDEEFFGGAABBCCDDEEFFGGAABBCCDDEEFFGG>