Question

I'm trying to do a simple test on MPI's RMA operation using MPI_Win_lock and MPI_Win_unlock. The program just let process 0 to update the integer value in process 1 and display it.

The below program runs correctly (at least the result seems correct to me):

#include "mpi.h"
#include "stdio.h"
#define root 0

int main(int argc, char *argv[])
{
  int myrank, nprocs;
  int send, recv, err;
  MPI_Win nwin;
  int *st;

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

  MPI_Alloc_mem(1*sizeof(int), MPI_INFO_NULL, &st);

  st[0] = 0;
  if (myrank != root) {
    MPI_Win_create(st, 1*sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &nwin);
  }
  else {
    MPI_Win_create(NULL, 0, sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &nwin);
  }

  if (myrank == root) {
    st[0] = 1;
    MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 1, 0, nwin);
    MPI_Put(st, 1, MPI_INT, 1, 0, 1, MPI_INT, nwin);
    MPI_Win_unlock(1, nwin);
    MPI_Win_free(&nwin);
  }
  else { // rank 1
    MPI_Win_free(&nwin);
    printf("Rank %d, st = %d\n", myrank, st[0]);
  }

  MPI_Free_mem(st);
  MPI_Finalize();
  return 0;
}

The output I got is Rank 1, st = 1. But curiously, if I switch the lines in the else block for rank 1 to

  else { // rank 1
    printf("Rank %d, st = %d\n", myrank, st[0]);
    MPI_Win_free(&nwin);
  }

The output is Rank 1, st = 0.

I cannot find out the reason behind it, and why I need to put MPI_Win_free after loading the data is originally I need to put all the stuff in a while loop and let rank 0 to determine when to stop the loop. When condition is satisfied, I try to let rank 0 to update the flag (st) in rank 1. I try to put the MPI_Win_free outside the while loop so that the window will only be freed after the loop. Now it seems that I cannot do this and need to create and free the window every time in the loop?

Was it helpful?

Solution

I'll be honest, MPI RMA is not my speciality, but I'll give this a shot:

The problem is that you're running into a race condition. When you do the MPI_PUT operation, it sends the data from rank 0 to rank 1 to be put into the buffer at some point in the future. You don't have any control over that from rank 0's perspective.

One rank 1's side, you're not doing anything to complete the operation. I know that RMA (or one-sided operations) sound like they shouldn't require any intervention on the target side, but the do require a bit. When you use one-sided operations, you have to have something on the receiving side that also synchronizes the data. In this case, you're trying to use MPI put/get operations in combination with non-MPI load store operations. This is erroneous and results in the race condition you're seeing. When you switch the MPI_WIN_FREE to be first, you complete all of the outstanding operations so your data is correct.

You can find out lots more about passive target synchronization (which is what you're doing here) with this question: MPI with C: Passive RMA synchronization.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top