This code is probably the culprit:
if (rank==0){
//printf("\nrank: %d\n", rank);
for (i=1; i<size; i++) // untill all slaves have handed back the processed data
{
MPI_Recv(&aux, 1, MPI_INT, MPI_ANY_SOURCE, SLAVE_TO_MASTER_TAG+1, MPI_COMM_WORLD,&status);
MPI_Recv(&qty, 1, MPI_INT, MPI_ANY_SOURCE, SLAVE_TO_MASTER_TAG+2, MPI_COMM_WORLD,&status);
MPI_Recv(&(*im).array[aux], qty*3, MPI_BYTE, MPI_ANY_SOURCE, SLAVE_TO_MASTER_TAG, MPI_COMM_WORLD,&status);
}
}
Since you are (ab-)using MPI_ANY_SOURCE
, you are essentially creating the perfect conditions for message reception races. It is entirely possible that the first MPI_Recv
matches a message from rank i, the second one matches a message from rank j and the third one matches a message from rank k, where i, j, and k have completely different values. Therefore it is possible that you receive the wrong number of pixels into the wrong image slot. Also, if it happens that rank k sends more pixels than the value of qty
from rank j specifies, you'll get a truncation error (and you are actually getting it). A word of advice: never use MPI_ANY_SOURCE
frivolously unless absolutely sure that the algorithm is correct and no races could occur.
Either rewrite the code as:
if (rank==0){
//printf("\nrank: %d\n", rank);
for (i=1; i<size; i++) // untill all slaves have handed back the processed data
{
MPI_Recv(&aux, 1, MPI_INT, i, SLAVE_TO_MASTER_TAG+1, MPI_COMM_WORLD, &status);
MPI_Recv(&qty, 1, MPI_INT, i, SLAVE_TO_MASTER_TAG+2, MPI_COMM_WORLD, &status);
MPI_Recv(&(*im).array[aux], qty*3, MPI_BYTE, i, SLAVE_TO_MASTER_TAG, MPI_COMM_WORLD, &status);
}
}
or even better as:
if (rank==0){
//printf("\nrank: %d\n", rank);
for (i=1; i<size; i++) // untill all slaves have handed back the processed data
{
MPI_Recv(&aux, 1, MPI_INT, MPI_ANY_SOURCE, SLAVE_TO_MASTER_TAG+1,
MPI_COMM_WORLD, &status);
MPI_Recv(&qty, 1, MPI_INT, status.MPI_SOURCE, SLAVE_TO_MASTER_TAG+2,
MPI_COMM_WORLD, &status);
MPI_Recv(&(*im).array[aux], qty*3, MPI_BYTE, status.MPI_SOURCE, SLAVE_TO_MASTER_TAG,
MPI_COMM_WORLD, &status);
}
}
That way the three receives will always get messages from the same process and the race condition will be eliminated. The way the second version works is that it first receives a message from any rank but then uses the status.MPI_SOURCE
field to get the actual rank and use it for the following receive.