Duplicate the functionality of MPI_Gather
using MPI_Gatherv
but specify 0
as the chunk size for the root rank instead. Something like this:
int rank, size, disp = 0;
int *cnts, *displs;
MPI_Comm_size(MPI_COMM_WORLD, &size);
cnts = malloc(size * sizeof(int));
displs = malloc(size * sizeof(int));
for (rank = 0; rank < size; rank++)
{
cnts[i] = (rank != root) ? count : 0;
displs[i] = disp;
disp += cnts[i];
}
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Gatherv(data, cnts[rank], data_type,
bigdata, cnts, displs, data_type,
root, MPI_COMM_WORLD);
free(displs); free(cnts);
Note that MPI_Gatherv
could be significantly slower than MPI_Gather
because the MPI implementation would be most likely unable to optimise the communication path and would fall back to some dumb linear implementation of the gather operation. So it might make sense to still use MPI_Gather
and to provide some dummy data in the root process.
You could also supply MPI_IN_PLACE
as the value of the root process send buffer and it would not send data to itself, but then again you would have to reserve place for the root data in the receive buffer (the in-place operation expects that the root would place its data directly in the correct position inside the receive buffer):
if (rank != root)
MPI_Gather(data, count, data_type,
NULL, count, data_type, root, MPI_COMM_WORLD);
else
MPI_Gather(MPI_IN_PLACE, count, data_type,
big_data, count, data_type, root, MPI_COMM_WORLD);