A bit too much code to fit in a comment:
I'd suggest just doing this as a single MPI_Allgatherv()
:
std::vector<int> disps(n_proc);
disps[0] = 0;
for (int i=1; i<n_proc; i++)
disps[i] = disps[i-1] + points_per_proc[i-1];
int totdata = disps[n_proc-1] + points_per_proc[n_proc-1];
std::vector<double> temp(totdata);
MPI_Allgatherv(&Xcoord_top[my_rank][0], Xcoord_top[my_rank].size(),
MPI_Double, temp, points_per_proc, disps, MPI_DOUBLE,
MPI_COMM_WORLD);
and now the data for proc i
is in temp[disps[i]]...temp[disps[i+1]-1]
.
There's at least three problems with the code as originally posted:
- It could well deadlock (Sends are allowed to block until received) - that could be fixed with using asynchronous sends, eg
MPI_Isend()
with a followingMPI_Waitall()
rather thanMPI_Send()
; - It will almost certainly process the receives out of order (there's no guarantee that in the ith iteration it is receiving from the ith processor), and so the message lengths might be wrong resulting in an error which will abort the program - that could be fixed by fixing the source as rank
i
rather thanMPI_ANY_SOURCE
; and - It is inefficient, using linear point-to-point sends and receives instead of optimized collectives like broadcasts or gathers - that can be fixed by using collectives, such as allgather, as above.