============================= Scatter and gather operations [TOC] ============================= One common communication pattern is * to divide a vector or a matrix among all participating processes (_scatter_) and, as soon as the individual processes have computed their results, * to collect the results in one big vector or matrix (_gather_). Scatter and gather operations could be done using `MPI_Send` and `MPI_Recv` where a master process (usually that with rank 0) repeatedly uses `MPI_Send` at the beginning to send out the individual partitions of the vector or matrix and which at the joining phase repeatedly uses `MPI_Recv` to collect the results. This is, however, inefficient as this causes the latency periods to add up. It is better to parallelize the send and receive operations at the side of the master. Fortunately, MPI offers the operations `MPI_Scatter` and `MPI_Gather` which perform a series of send or receive operations in parallel: ---- CODE (type=cpp) ---------------------------------------------------------- int MPI_Scatter(const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm); int MPI_Gather(const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm); ------------------------------------------------------------------------------- In both cases, `root` is the rank of the process that distributes or collects partitions of a matrix. This is usually 0. In case of `MPI_Scatter`, each of the processes (including root!) receives exactly `sendcount` elements. More precisely, the process with rank _i_ receives the elements $i \cdot sendcount, \dots, (i+1) \cdot sendcount - 1$. Typically, `sendcount` equals `recvcount`. But as with the counts of `MPI_Send` and `MPI_Recv`, `recvcount` may be larger than `sendcount`. As usual, compatibility between `sendtype` and `recvtype` is required. Similarly, `recvcount` specifies for `MPI_Gather` how many elements are to be received by the process with rank `root` from each of the processes (including itself). Much like as in the case of `MPI_Bcast`, the same call of `MPI_Scatter` can be executed by all processes. However, if it appears wasteful to allocate a large matrix for each of the processes, the calls can be separated. `MPI_Scatter` ignores the parameters `sendbuf`, `sendcount`, and `sendtype` for non-root processes. Likewise `MPI_Gather` considers the parameters `recvbuf`, `revcount`, and `recvtype` for the root process only. If the program text is separated for these two cases, you are free to use `nullptr` (in C++) or 0 (in C) for the unused parameters. Exercise ======== Following test program initializes a matrix in process 0 and distributes it among the processes where each of them receives `share` rows. After each of the processes has updated its partition of the matrix, the updated rows are to be collected and printed. Add the necessary `MPI_Scatter` and `MPI_Gather` invocations. Source code =========== :import:session07/scatter-gather.cpp [fold] :navigate: up -> doc:index next -> doc:session07/page02