Ensuring two messages come from the same MPI task

Question

While MPI has an in-order guarantee in the sense that two messages from rank 1 to rank 0 will be received by rank 0 in the order they will sent - one message cannot overtake the other - MPI says nothing, and can say nothing, about how they will be interleaved with other messages from other processors. So you can easily get situations like:

  rank 1 messages to rank 0: [src 1, msg A, tag 1], [src 1, msg B, tag 2]  
  rank 2 messages to rank 0: [src 2, msg C, tag 1], [src 2, msg D, tag 2]

  rank 0 message queue: [src 1, msg A, tag 1], [src 2, msg C, tag 1], [src 2, msg D, tag 2], [src 1, msg B, tag 2]

So that rank 0 extracting a message with tag 1 will get rank 1's msg A, but then with tag 2 will get rank 2's msg D. (Note that the message queue above satisfies the in-order guarantee above but doesn't help us here).

There's a few ways around this. One is to filter the messages received for newdeltaw not just by tag but by source, to make sure it is from the same task that sent the neww:

if rank == 0:
    cb = numpy.zeros(size)
    rstat = MPI.Status()
    for i in range((size-1)*3):
        neww = comm.recv(source=MPI.ANY_SOURCE, tag=1, status=rstat)
        src = rstat.Get_source()
        newdeltaw = comm.recv(source=src, tag=2)
        print "newdelw is",newdeltaw,"neww is",neww
        cb[neww]=cb[neww]+newdeltaw
        print "cb=",cb
else:
    data = rank
    for i in range(3):
        comm.send(rank,dest=0,tag=1)
        comm.send(data,dest=0,tag=2)

This way, only the tag-2 newdeltaw message from the matching source is received, avoiding the inconsistency.

Another approach is to avoid splitting the messages at all, by putting both pieces of data into the same message:

if rank == 0:
    cb = numpy.zeros(size)
    rstat = MPI.Status()
    for i in range((size-1)*3):
        (neww,newdeltaw) = comm.recv(source=MPI.ANY_SOURCE, tag=1)
        print "newdelw is",newdeltaw,"neww is",neww
        cb[neww]=cb[neww]+newdeltaw
        print "cb=",cb

else:
    data = rank
    for i in range(3):
        comm.send((rank,data),dest=0,tag=1)

This bundles both pieces of data into one message, so they can't be separated. (Note that once this is working, you can use more efficient lower-level mpi4py routines to avoid serializing the tuples:

if rank == 0:
    cb = numpy.zeros(size)
    rstat = MPI.Status()
    for i in range((size-1)*3):
        dataarr = numpy.zeros(2,dtype='i')
        comm.Recv([dataarr,MPI.INT],source=MPI.ANY_SOURCE, tag=1)
        newdeltaw = dataarr[0]
        neww = dataarr[1]
        print "newdelw is",newdeltaw,"neww is",neww
        cb[neww]=cb[neww]+newdeltaw
        print "cb=",cb

else:
    data = rank
    for i in range(3):
        senddata = numpy.array([rank,data],dtype='i')
        comm.Send([senddata, MPI.INT],dest=0,tag=1)

Finally, you can avoid the master/slave approach entirely and have all processors working on their partial results in the problem, and then combine all the results at the end with a reduce operation:

cb = numpy.zeros(size,dtype='i')
totals = numpy.zeros(size,dtype='i')

data = rank
for i in range(3):
    cb[rank] = cb[rank] + data

comm.Reduce([cb,MPI.INT], [totals,MPI.INT], op=MPI.SUM, root=0)

if rank == 0:
    print "result is", totals