============================================ Unblocked LU: Variant based on GEMV and TRSV [TOC] ============================================ In Session 12 we implemented the LU factorization based on the BLAS operation GER (rank 1 update). In this session we consider another variant that is based on the BLAS operations GEMV (matrix vector product) and TRSV (triangular solver). Note that the term variant is used here to express that both algorithms (and implementation) carry out the same underlying floating point operations. However, because of the different BLAS building blocks the operations are carried out in a different order. So it is possible that each variant of an algorithm achieves a different different performance. That is because: - Depending on an architecture different BLAS operations can be realized more or less efficient. - Different BLAS operations might be implemented more or less efficient. Algorithm ========= ---- BOX ----------------------------------------------------------------------- +--------------+-----------------------------------------------------------+ | Input: | An $m \times n$ matrix $A$. | +--------------+-----------------------------------------------------------+ | Output: | - $A$ is overwritten with the LU factorization. | | | - $p$ a pivot vector of length $k := \text{min}\{m, n\}$. | +--------------+-----------------------------------------------------------+ | Return value:| - $-1$ on success and otherwise the index of the diagonal | | | element that was exactly zero during the factorization | +--------------+-----------------------------------------------------------+ - For $j=0, \dots, k-1$ - Consider for the first $j+1$ columns of $A$ the partition ---- LATEX ----------------------------------------------------------------- \left(\begin{array}{ccc} a_{0,0} & \dots & a_{0,j} \\ \vdots & & \vdots \\ a_{m,0} & \dots & a_{m,j} \\ \end{array}\right) = \left(\begin{array}{c|c} A_{0,0} & \vec{a_{0,j}} \\ \hline A_{j,0} & \vec{a_{j,j}} \\ \end{array}\right) ---------------------------------------------------------------------------- - *TRSV operation* Compute $\vec{a_{0,j}} \leftarrow L_{\text{unit}}^{-1}(A_{0,0}) \vec{a_{0,j}}$ where $L_{\text{unit}}( \cdot )$ denotes that a matrix argument is considered as a lower unit triangular matrix. - *GEMV operation* Compute $\vec{a_{j,j}} \leftarrow \vec{a_{j,j}} - A_{j,0} \vec{a_{0,j}}$. - *Partial pivoting* Find pivot index $p_j$ from $\vec{a_{j,j}}$, i.e. ---- LATEX ----------------------------------------------------------------- p_j = j + \text{argmax}_{0 \leq \ell < m} a_{j,l} ---------------------------------------------------------------------------- If $p_j \neq j$ interchange in $A$ rows with index $j$ and $p_j$. - Consider for $\vec{a_{j,j}}$ the sub-partition ---- LATEX ----------------------------------------------------------------- \vec{a_{j,j}} = \left(\begin{array}{c} a_{j,j} \\ \hline \vec{a_{j+1,j}} \end{array}\right) ---------------------------------------------------------------------------- - If $a_{j,j}$ is exactly zero return $j$ - *Scale operation* Compute $\vec{a_{j+1,j}} \leftarrow \frac{1}{a_{j,j}} \vec{a_{j+1,j}}$ - For $j=k, \dots, n-1$ - Consider for $A$ the partition ---- LATEX ----------------------------------------------------------------- A = \left(\begin{array}{c|c|c|c} A_{0,0} & \vec{a_{0,k}} & \cdots & \vec{a_{0,n-1}} \\ \end{array}\right) ---------------------------------------------------------------------------- - *TRSV operation* Compute $\vec{a_{0,j}} \leftarrow L_{\text{unit}}^{-1}(A_{0,0}) \vec{a_{0,j}}$ -------------------------------------------------------------------------------- Exercise ======== - Use the source file `ulmblas.c` from Session 12 as starting point. Or copy `ulmblas.c` from ` /home/numerik/pub/hpc/ss18/ulmblas/session13b/ulmblas.c` - Modify the code as follows, so that we can use a compiler switch to select the implementation for `dgetrf`: ---- CODE (type=c) ----------------------------------------------------------- /* Implementation of other BLAS functions */ ptrdiff_t dgetrf_ger(size_t m, size_t n, double *A, ptrdiff_t incRowA, ptrdiff_t incColA, size_t *p, ptrdiff_t incP) { /* Implementation from session 12 */ } ptrdiff_t dgetrf_gemv(size_t m, size_t n, double *A, ptrdiff_t incRowA, ptrdiff_t incColA, size_t *p, ptrdiff_t incP) { /* TODO: Your implementation based on dgemv and dtrsv */ } #define GETRF_GER 1 #define GETRF_GEMV 2 #ifndef GETRF #define GETRF 0 #endif // this pragma makes gcc stop immediately on '#error' #pragma GCC diagnostic error "-Wfatal-errors" ptrdiff_t dgetrf(size_t m, size_t n, double *A, ptrdiff_t incRowA, ptrdiff_t incColA, size_t *p, ptrdiff_t incP) { #if GETRF == GETRF_GER return dgetrf_ger(m, n, A, incRowA, incColA, p, incP); #elif GETRF == GETRF_GEMV return dgetrf_gemv(m, n, A, incRowA, incColA, p, incP); #else #error "Compile with either -DGETRF=GETRF_GER or -DGETRF=GETRF_GEMV" #endif } ------------------------------------------------------------------------------ - Implement function `dgetrf_gemv`. Use the GEMV/TRSV based algorithm. Test your implementation as described in the next section. Test and Benchmark ================== ---- SHELL (path=session14, hide) ---------------------------------------------- rm -rf getrf mkdir getrf cd getrf cp /home/numerik/pub/hpc/ss18/ulmblas/*.[hc] . cp /home/numerik/pub/hpc/ss18/ulmblas/session14/ulmblas.* . cp ../getrf.plot . -------------------------------------------------------------------------------- - Compile with either `-DGETRF=GETRF_GER` or `-DGETRF=GETRF_GER`. Otherwise you get *the following error*: ---- SHELL (path=session14/getrf, hostname=heim) ----------------------------- gcc -Wall -std=c11 -I. -O3 -o test_dgetrf test_dgetrf.c ulmaux.c ulmblas.c ------------------------------------------------------------------------------ - Create different executables and check results: - You can create an executable `test_dgetrf_ger` for testing and benchmarking the variant based on GER: ---- SHELL (path=session14/getrf, hostname=heim,fold) ---------------------- gcc -Wall -std=c11 -I. -O3 -o test_dgetrf_ger -DGETRF=GETRF_GER +++ test_dgetrf.c ulmaux.c ulmblas.c ./test_dgetrf_ger check ---------------------------------------------------------------------------- - and `test_dgetrf_gemv` for benchmarking the variant based on GEMV and TRSV: ---- SHELL (path=session14/getrf, hostname=heim,fold) ---------------------- gcc -Wall -std=c11 -I. -O3 -o test_dgetrf_gemv -DGETRF=GETRF_GEMV +++ test_dgetrf.c ulmaux.c ulmblas.c ./test_dgetrf_gemv check ---------------------------------------------------------------------------- - Benchmark both variants: ---- SHELL (path=session14/getrf, hostname=heim,fold) ---------------------- ./test_dgetrf_ger bench > bench.dgetrf_ger ./test_dgetrf_gemv bench > bench.dgetrf_gemv cat bench.dgetrf_ger cat bench.dgetrf_gemv ---------------------------------------------------------------------------- You can use the Gnuplot script :import: session14/getrf.plot for plotting the performance: ---- SHELL(path=session14/getrf,hostname=heim) ----------------------------- gnuplot getrf.plot ---------------------------------------------------------------------------- gives ---- IMAGE ----------------------- session14/getrf/bench.getrf.svg ----------------------------------