In the function MPI_Iallgatherv
int MPI_Iallgatherv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
void *recvbuf, const int recvcounts[], const int displs[],
MPI_Datatype recvtype, MPI_Comm comm, MPI_Request *request)
the argument recvcounts[] is an input argument, so as far as I can understand the program should know in advance how many elements must be received from each process. It is possible to set values larger than the the number of elements that are actually sent, as in the following example
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char* argv[])
{
MPI_Init(&argc, &argv);
int size;
MPI_Comm_size(MPI_COMM_WORLD, &size);
if(size != 3) {
printf("This application must run with 3 MPI processes.n");
MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
}
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int counts[3] = {1000, 1000, 1000}; // Max n. of elems that can be received
int displs[3] = {0, 1000, 2000}; // Displacements
int buffer[3000]; // Receiving buffer
// Buffer containing our data to send
int send_values[1000]; // buffer for data to be sent
int send_count;
switch(rank) {
case 0:
{
send_count = 1;
send_values[0] = 3;
break;
}
case 1:
{
send_count = 2;
send_values[0] = 1;
send_values[1] = 4;
break;
}
case 2:
{
send_count = 3;
send_values[0] = 1;
send_values[1] = 5;
send_values[2] = 9;
break;
}
}
MPI_Request recv_mpi_request;
MPI_Iallgatherv(send_values, send_count, MPI_INT, buffer, counts, displs,
MPI_INT, MPI_COMM_WORLD, &recv_mpi_request);
MPI_Status status;
MPI_Waitall(1, &recv_mpi_request, &status);
printf("Values gathered on process %d:", rank);
for(int i=0; i<3; i++) {
for(int j=0; j<3; j++) {
printf(" %d", buffer[i*1000+j]);
}
printf("t");
}
printf("n");
MPI_Finalize();
return EXIT_SUCCESS;
}
however I cannot find any elegant way to retrieve the number of elements that are actually sent by each process, and furthermore I wonder if the time required by the communication is affected by the fact that I am using predefined values of counts much larger than their actual values. I know that in principle I could use two separate MPI communications, one to get the counts, with one entry per process, and the second one to transfer the data with the previously obtained sizes. However each MPI transfer has a time overhead that is independent on the size and that is not negligible for the application I am working on. I wonder if there is a reasonable way to get the number of actual counts with a single call to MPI_Iallgatherv. Can someone help me?