How do I access and print the complete vector distributed among MPI workers?

https://stackoverflow.com/questions/12256650

30-06-2021
|

Question

How do I access a global vector from an individual thread in MPI?

I'm using a library - specifically, an ODE solver library - called CVODE (part of SUNDIALS). The library works with MPI, so that multiple threads are running in parallel. They are all running the same code. Each thread sends the thread "next to" it a piece of data. But I want one of the threads (rank=0) to print out the state of the data at some points.

The library includes functions so that each thread can access their own data (the local vector). But there is no method to access the global vector.

I need to output the values of all of the equations at specific times. To do so, I would need access to the global vector. Anyone know how get at all of the data in an MPI vector (using CVODE, if possible)?

For example, here is my code that each thread runs

  for (iout=1, tout=T1; iout <= NOUT; iout++, tout += DTOUT) {
    flag = CVode(cvode_mem, tout, u, &t, CV_NORMAL);
    if(check_flag(&flag, "CVode", 1, my_pe)) break;
    if (my_pe == 0) PrintData(t, u);
  }
...
static void PrintData(realtype t, N_Vector u) {
   I want to print data from all threads in here
}

In function f (the function I'm solving), I pass data back and forth using MPI_Send and MPI_Recv. But I can't really do that in PrintData because the other processes have run ahead. Also, I don't want to add messaging overhead. I want to access the global vector in PrintData, and then just print out what's needed. Is it possible?

Edit: While waiting for a better answer, I programmed each thread passing the data back to the 0th thread. I don't think that's adding too much messaging overhead, but I'd still like to hear from you experts if there's a better method (I'm sure there isn't any worse ones! :D ).

Edit 2: Although angainor's solution is surely superior, I stuck with the one I had created. For future reference of anyone who has the same question, here is the basics of how I did it:

/* Is called by all threads */
static void PrintData(realtype t, N_Vector u, UserData data) {

... declarations and such ...

  for (n=1; n<=my_length; n++) {
    mass_num = my_base + n;
    z[mass_num - 1] = udata[n-1];
    z[mass_num - 1 + N] = udata[n - 1 + my_length];
  }

  if (my_pe != 0) {
    MPI_Send(&z, 2*N, PVEC_REAL_MPI_TYPE, 0, my_pe, comm);

  } else {

    for (i=1; i<npes; i++) {
      MPI_Recv(&z1, 2*N, PVEC_REAL_MPI_TYPE, i, i, comm, &status);
      for (n=0; n<2*N; n++)
        z[n] = z[n] + z1[n];
    }

... now I can print it out however I like...

  return;
}

Solution

When using MPI the individual threads do not have access to a 'global' vector. They are not threads, they are processes that can run on different physical computers and therefore can not have direct access to global data.

To do what you want you can either send the vector to one of the MPI processes (you did that) and print it there, or to print local worker parts in sequence. Use a function like this:

void MPI_write_ivector(int thrid, int nthr, int vec_dim, int *v)
{
  int i, j;
  int curthr = 0;

  MPI_Barrier(MPI_COMM_WORLD);
  while(curthr!=nthr){
    if(curthr==thrid){
      printf("thread %i writing\n", thrid);
      for(i=0; i<vec_dim; i++) printf("%d\n", v[i]);
      fflush(stdout);
      curthr++;
      MPI_Bcast(&curthr, 1, MPI_INT, thrid, MPI_COMM_WORLD);
    } else {
      MPI_Bcast(&curthr, 1, MPI_INT, curthr, MPI_COMM_WORLD);
    }
  }
}

All MPI processes should call it at the same time since there is a barrier and broadcast inside. Essentially, the procedure makes sure that all the MPI processes print their vector part in order, starting from rank 0. The data is not messed up since only one process writes at any given time.

In the example above, Broadcast is used since it gives more flexibility on the order in which the threads should print their results - the thread that currently outputs can decide, who comes next. You could also skip the broadcast and only use a barrier

void MPI_write_ivector(int thrid, int nthr, int vec_dim, int *v)
{
  int i, j;
  int curthr = 0;

  while(curthr!=nthr){
    if(curthr==thrid){
      printf("thread %i writing\n", thrid);
      for(i=0; i<vec_dim; i++) printf("%d\n", v[i]);
      fflush(stdout);
    }
    MPI_Barrier(MPI_COMM_WORLD);
    curthr++;
  }
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow