Twodimensional partitioning for the Jacobi solver
Content 
If we want to partition the matrix for the Jacobi solver in two dimensions then each process needs to communicate with its four neighbors within the grid. In case of multidimensional grids, MPI tells us the ranks of our neighbors:
// get the process numbers of our neighbors int left, right, upper, lower; MPI_Cart_shift(grid, 0, 1, &upper, &lower); MPI_Cart_shift(grid, 1, 1, &left, &right);
The function MPI_Cart_shift delivers the neighbors in the dimension that is specified by the second parameter. The third parameter specifies the distance (1 for the immediate neighbor). The ranks of the neighboring processes will be stored in the variables whose addresses are given in the forth and fifth parameter. If there is no neighbor (at the border of the matrix), MPI_PROC_NULL is returned.
During each iteration each process has to communicate bidirectionally with each of its neighbors. In each of these interchanges a row or a column is to be exchanged in both directions. These are eight input and output operations in total per process. The latency periods would add up significantly if this would not be done in parallel. It is even advisable to parallelize communication and the computation of the next iteration. This is possible with following approach:
Let \(A\) and \(B\) be two matrices where \(A\) stores the state from the last iteration and where \(B\) is used to compute the state for the next iteration. We define the outer border as the union of those sections which we receive from our neighbors and the inner border as that area that is to be sent to our neighbors. Then an iterative step could be organized as follows:

Compute the inner border of \(B\) in dependence of \(A\).

Initiate following asynchronous input and output operations:

Transfer the inner border to our four neighbors.

Transfer the outer border from our four neighbors. (Note that we ignore the corner elements as they are not used for the 5point difference star.)


Compute the core of \(B\), i.e. the block matrix that is enclosed by the inner border.

Wait until all asynchronous input and output operations are completed.
Exercise
Implement the Jacobi solver with twodimensional partitioning as described.