=============== Sample solution [TOC] =============== The sample solution is explained here step by step. The full source text of the solution is at the end of this page. At first we organize all processes in a two-dimensional grid. The associated communication domain has the name `grid`: ---- CODE (type=cpp) ---------------------------------------------------------- /* create two-dimensional Cartesian grid for our prcesses */ int dims[2] = {0, 0}; int periods[2] = {false, false}; MPI_Dims_create(nof_processes, 2, dims); MPI_Comm grid; MPI_Cart_create(MPI_COMM_WORLD, 2, // number of dimensions dims, // actual dimensions periods, // both dimensions are non-periodical true, // reorder is permitted &grid // newly created communication domain ); MPI_Comm_rank(grid, &rank); // update rank (could have changed) ------------------------------------------------------------------------------- Next, we declare the full matrix ($100 \times 100$). To keep it simple this matrix is declared by all processes, albeit just the process with rank 0 needs the full matrix. ---- CODE (type=cpp) ---------------------------------------------------------- /* initialize the entire matrix, including its borders */ Matrix A(100, 100, Order::RowMajor); if (rank == 0) { for (auto [i, j, Aij]: A) { if (j == 0) { Aij = std::sin(PI * ((double)i/(A.numRows()-1))); } else if (j == A.numCols() - 1) { Aij = std::sin(PI * ((double)i/(A.numRows()-1))) * E_POWER_MINUS_PI; } else { Aij = 0; } } } ------------------------------------------------------------------------------- Then we declare two matrices which represent the partition associated with the current process. We need two matrices such that we are able to proceed from `B1` to `B2` in one Jacobi iteration and from `B2` to `B1` in the next Jacobi iteration. The `overlap` variable specifies the size of the borders. For the 5-point difference star we just need an overlap value of 1. ---- CODE (type=cpp) ---------------------------------------------------------- /* get our position within the grid */ int coords[2]; MPI_Cart_coords(grid, rank, 2, coords); /* create matrices B1, B2 for our subarea */ int overlap = 1; UniformSlices rows(dims[0], A.numRows() - 2*overlap); UniformSlices columns(dims[1], A.numCols() - 2*overlap); Matrix B1(rows.size(coords[0]) + 2*overlap, columns.size(coords[1]) + 2*overlap, Order::RowMajor); Matrix B2(rows.size(coords[0]) + 2*overlap, columns.size(coords[1]) + 2*overlap, Order::RowMajor); ------------------------------------------------------------------------------- Now the individual partitions along with the overlapping borders can be distributed among the available processes. This is done by the already developed function `scatter_by_block`. We must not forget to copy the global border from `B1` to `B2` for processes in a border position as the global border will not be updated otherwise. To keep it simple, we use the `copy` function for this. ---- CODE (type=cpp) ---------------------------------------------------------- /* distribute main body of A including left and right border */ scatter_by_block(A, B1, 0, grid, dims, coords, overlap); copy(B1, B2); /* actually just the border needs to be copied */ ------------------------------------------------------------------------------- Then we determine our neighbors as already shown: ---- CODE (type=cpp) ---------------------------------------------------------- /* get the process numbers of our neighbors */ int left, right, upper, lower; MPI_Cart_shift(grid, 0, 1, &upper, &lower); MPI_Cart_shift(grid, 1, 1, &left, &right); ------------------------------------------------------------------------------- For the exchange with our neighbors we need two types, one for the rows and one for the columns: ---- CODE (type=cpp) ---------------------------------------------------------- /* compute type for inner rows and cols without the border */ auto B_inner_row = B1.block(0, 1).dim(overlap, B1.numCols() - 2*overlap); MPI_Datatype rowtype = get_type(B_inner_row); auto B_inner_col = B1.block(0, 1).dim(B1.numRows() - 2*overlap, overlap); MPI_Datatype coltype = get_type(B_inner_col); ------------------------------------------------------------------------------- To simplify input and output operations it is helpful to create a list of transfer operations. One transfer operation is defined by the communication partner (`rank`), the beginning address of the to be transfered area and the associated datatype which is either `rowtype` or `coltype`: ---- CODE (type=cpp) ---------------------------------------------------------- struct IOTransfer { int rank; void* addr; MPI_Datatype datatype; }; ------------------------------------------------------------------------------- These transfer operations are split into send and receive operations that are to be executed by `MPI_Isend` and `MPI_Irecv`, respectively. The list `B1_in_vectors` maintains the specifications for all the receive operations into the outer border of `B1`. Likewise `B1_out_vectors` lists all output operations of the inner border of `B1`. Similar lists are also available for `B2`. We need them for both matrices, as we perform iterations from `B1` to `B2` and also from `B2` to `B1`. The second variant is simply derived by substituting `B1` by `B2`. ---- CODE (type=cpp) ---------------------------------------------------------- IOTransfer B1_in_vectors[] = { {left, &B1(1, 0), coltype}, {upper, &B1(0, 1), rowtype}, {right, &B1(1, B1.numCols()-overlap), coltype}, {lower, &B1(B1.numRows()-overlap, 1), rowtype}, }; IOTransfer B1_out_vectors[] = { {left, &B1(1, 1), coltype}, {upper, &B1(1, 1), rowtype}, {right, &B1(1, B1.numCols()-2*overlap), coltype}, {lower, &B1(B1.numRows()-2*overlap, 1), rowtype}, }; ------------------------------------------------------------------------------- Now it is time for the main iterative loop. As the global synchronization using `MPI_Reduce` and `MPI_Bcast` does not come cheap, we do this for every tenth iteration only. The iterative step itself is done by `jacobi_iteration` that receives the matrices and the matching transfer vectors as parameters: ---- CODE (type=cpp) ---------------------------------------------------------- /* main loop for the Jacobi iterations */ double eps = 1e-6; unsigned int iterations; for (iterations = 0; ; ++iterations) { double maxdiff = jacobi_iteration(B1, B2, B2_in_vectors, B2_out_vectors); maxdiff = jacobi_iteration(B2, B1, B1_in_vectors, B1_out_vectors); if (iterations % 10 == 0) { double global_max; MPI_Reduce(&maxdiff, &global_max, 1, get_type(maxdiff), MPI_MAX, 0, MPI_COMM_WORLD); MPI_Bcast(&global_max, 1, get_type(maxdiff), 0, MPI_COMM_WORLD); if (global_max < eps) break; } } if (rank == 0) fmt::printf("%d iterations\n", iterations); ------------------------------------------------------------------------------- The signature of `jacobi_iteration` looks as follows. Note that `in_vectors` and `out_vectors` are passed by reference as this helps to simplify the corresponding loops. (If you would pass them as simple pointers, the length information would be lost.) ---- CODE (type=cpp) ---------------------------------------------------------- template typename Matrix, Require>> = true> T jacobi_iteration(const Matrix& A, Matrix& B, const IOTransfer (&in_vectors)[4], const IOTransfer (&out_vectors)[4]) { assert(A.numRows() > 2 && A.numCols() > 2); T maxdiff = 0; /* ... */ return maxdiff; } ------------------------------------------------------------------------------- To avoid needless repetitions for the 5-point difference star formula, we introduce a lambda expression as local function: ---- CODE (type=cpp) ---------------------------------------------------------- auto jacobi = [&](std::size_t i, std::size_t j) { B(i, j) = 0.25 * (A(i - 1, j) + A(i + 1, j) + A(i, j - 1) + A(i, j + 1)); T diff = std::fabs(A(i, j) - B(i, j)); if (diff > maxdiff) maxdiff = diff; }; ------------------------------------------------------------------------------- Firstly, we compute our inner border: ---- CODE (type=cpp) ---------------------------------------------------------- /* compute the border first which is sent in advance to our neighbors */ for (std::size_t i = 1; i + 1 < B.numRows(); ++i) { jacobi(i, 1); jacobi(i, B.numCols()-2); } for (std::size_t j = 2; j + 2 < B.numCols(); ++j) { jacobi(1, j); jacobi(B.numRows()-2, j); } ------------------------------------------------------------------------------- Secondly, we initiate the communication with our neighbors where we use the transfer vectors: ---- CODE (type=cpp) ---------------------------------------------------------- /* send border to our neighbors and initiate the receive operations */ MPI_Request requests[8]; int ri = 0; for (auto& in_vector: in_vectors) { MPI_Irecv(in_vector.addr, 1, in_vector.datatype, in_vector.rank, 0, MPI_COMM_WORLD, &requests[ri++]); } for (auto& out_vector: out_vectors) { MPI_Isend(out_vector.addr, 1, out_vector.datatype, out_vector.rank, 0, MPI_COMM_WORLD, &requests[ri++]); } ------------------------------------------------------------------------------- Thirdly, we compute our inner block while the initiated communication proceeds in parallel. ---- CODE (type=cpp) ---------------------------------------------------------- /* compute the inner block in parallel to the initiated communication */ for (std::size_t i = 2; i + 2 < B.numRows(); ++i) { for (std::size_t j = 2; j + 2 < B.numCols(); ++j) { jacobi(i, j); } } ------------------------------------------------------------------------------- Finally, we wait until the communication is completed: ---- CODE (type=cpp) ---------------------------------------------------------- /* wait for the initiated I/O to finish */ for (auto& request: requests) { MPI_Status status; MPI_Wait(&request, &status); } ------------------------------------------------------------------------------- Full solution ============= :import:session07/jacobi-2d.cpp [fold] ---- SHELL (path=session07) --------------------------------------------------- mpic++ -g -std=c++17 -I/home/numerik/pub/pp/ss19/lib $(pkg-config --cflags gdk-pixbuf-2.0) -o jacobi-2d jacobi-2d.cpp $(pkg-config --libs gdk-pixbuf-2.0) mpirun -np 4 jacobi-2d ------------------------------------------------------------------------------- ---- SHELL (path=session07,hostname=heim) ------------------------------------- OMPI_CXX=g++-8.3 mpic++ -g -std=c++17 -I/home/numerik/pub/pp/ss19/lib $(pkg-config --cflags gdk-pixbuf-2.0) -o jacobi-2d jacobi-2d.cpp $(pkg-config --libs gdk-pixbuf-2.0) -Wno-literal-suffix mpirun -np 4 jacobi-2d ------------------------------------------------------------------------------- :navigate: up -> doc:index back -> doc:session07/page11