سؤال

Assume that I have a very large array which I wish to send or receive with MPI (v1). In order to index this array, I use an unsigned long integer.

Now, all MPI function calls I have seen use int types for their "count" arguments, such as in this example:

MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)

But what if, in my implementation, I require the ability to send/receive an array larger than the maximum number an int can hold? The compiler, naturally, gives me an "invalid conversion" error, when I try to feed an unsigned integer to the "count" argument. I thought about doing a cast, but then I am worried that this will shrink my variable, so I am kind of at a loss what to do.

هل كانت مفيدة؟

المحلول

Doing a cast is not the solution as it will simply truncate the long count. There are two obstacles to overcome here - an easy one and a hard one.

The easy obstacle is the int type for the count argument. You can get past it simply by creating a contiguous type of smaller size and then send the data as multiples of the new datatype. An example code follows:

// Data to send
int data[1000];

// Create a contiguous datatype of 100 ints
MPI_Datatype dt100;
MPI_Type_contiguous(100, MPI_INT, &dt100);
MPI_Type_commit(&dt100);

// Send the data as 10 elements of the new type
MPI_Send(data, 10, dt100, ...);

Since the count argument of MPI_Type_contiguous is int, with this technique you can send up to (231-1)2 = (262 - 232 + 1) elements. If this is not enough, you can create a new contiguous datatype from the dt100 datatype, e.g.:

// Create a contiguous datatype of 100 dt100's (effectively 100x100 elements)
MPI_Datatype dt10000;
MPI_Type_contiguous(100, dt100, &dt10000);
MPI_Type_commit(&dt10000);

If your original data size is not a multiple of the size of the new datatype, you could create a structure datatype whose first element is an array of int(data_size / cont_type_length) elements of the contiguous datatype and whose second element is an array of datasize % cont_type_length elements of the primitive datatype. Example follows:

// Data to send
int data[260];

// Create a structure type
MPI_Datatype dt260;

int blklens[2];
MPI_Datatype oldtypes[2];
MPI_Aint offsets[2];

blklens[0] = 2; // That's int(260 / 100)
offsets[0] = 0;
oldtypes[0] = dt100;

blklens[1] = 60; // That's 260 % 100
offsets[1] = blklens[0] * 100L * sizeof(int); // Offsets are in BYTES!
oldtypes[1] = MPI_INT;

MPI_Type_create_struct(2, blklens, offsets, oldtypes, &dt260);
MPI_Type_commit(&dt260);

// Send the data
MPI_Send(data, 1, dt260, ...);

MPI_Aint is large enough integer that can hold offsets larger than what int can represent on LP64 systems. Note that the receiver must construct the same datatype and use it similarly in the MPI_Recv call. Receiving an arbitrary non-integer amount of the contiguous datatype is a bit problematic though.

That's the easy obstacle. The not so easy one comes when your MPI implementation does not use internally long counts. In that case MPI would usually crash or only send part of the data or something weird might happen. Such an MPI implementation could be crashed even without constructing a special datatype by simply sending INT_MAX elements of type MPI_INT as the total message size would be (231 - 1) * 4 = 233 - 4. If that is the case, your only escape is manually splitting the message and sending/receiving it in a loop.

نصائح أخرى

A quick/hacky solution is to do a reinterpret_cast<int>() of your unsigned counter in the sender, and do the reverse cast in the receiver. However I think a better solution is to make a struct that contains the pointer and the count with the correct types and follow the advice of this answer to create your own custom data type to pass around using MPI_Type_create_struct.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top