=============================
Scatter and gather operations                                  [TOC]
=============================

One common communication pattern is

 * to divide a vector or a matrix among all participating processes
   (_scatter_) and, as soon as the individual processes have computed
   their results,
 * to collect the results in one big vector or matrix (_gather_).

Scatter and gather operations
could be done using `MPI_Send` and `MPI_Recv` where a master
process (usually that with rank 0) repeatedly uses `MPI_Send` at
the beginning to send out the individual partitions of the vector
or matrix and which at the joining phase repeatedly uses `MPI_Recv`
to collect the results.

This is, however, inefficient as this causes the latency periods
to add up. It is better to parallelize the send and receive operations
at the side of the master.
Fortunately, MPI offers the operations `MPI_Scatter` and `MPI_Gather`
which perform a series of send or receive operations in parallel:

---- CODE (type=cpp) ----------------------------------------------------------
int MPI_Scatter(const void* sendbuf, int sendcount, MPI_Datatype sendtype,
                void* recvbuf, int recvcount, MPI_Datatype recvtype,
                int root, MPI_Comm comm);

int MPI_Gather(const void* sendbuf, int sendcount, MPI_Datatype sendtype,
               void* recvbuf, int recvcount, MPI_Datatype recvtype,
               int root, MPI_Comm comm);
-------------------------------------------------------------------------------

In both cases, `root` is the rank of the process that distributes
or collects partitions of a matrix. This is usually 0.

In case of `MPI_Scatter`, each of the processes (including root!)
receives exactly `sendcount` elements. More precisely, the
process with rank _i_ receives the elements
$i \cdot sendcount, \dots, (i+1) \cdot sendcount - 1$.
Typically, `sendcount` equals `recvcount`. But as with the
counts of `MPI_Send` and `MPI_Recv`, `recvcount` may
be larger than `sendcount`. As usual, compatibility between
`sendtype` and `recvtype` is required.
Similarly, `recvcount` specifies for `MPI_Gather` how many elements
are to be received by the process with rank `root` from each of
the processes (including itself).

Much like as in the case of `MPI_Bcast`, the same call of `MPI_Scatter`
can be executed by all processes. However, if it appears wasteful to
allocate a large matrix for each of the processes, the calls can be
separated.  `MPI_Scatter` ignores the parameters `sendbuf`,
`sendcount`, and `sendtype` for non-root processes. Likewise
`MPI_Gather` considers the parameters `recvbuf`, `revcount`, and
`recvtype` for the root process only. If the program text is separated
for these two cases, you are free to use `nullptr` (in C++) or 0 (in C)
for the unused parameters.

Exercise
========

Following test program initializes a matrix in process 0 and
distributes it among the processes where each of them receives
`share` rows. After each of the processes has updated its
partition of the matrix, the updated rows are to be collected
and printed.

Add the necessary `MPI_Scatter` and `MPI_Gather` invocations.

Source code
===========

:import:session07/scatter-gather.cpp [fold]

:navigate: up    -> doc:index
           next  -> doc:session07/page02