Unblocked LU: Variant based on GEMV and TRSV

Content

In Session 12 we implemented the LU factorization based on the BLAS operation GER (rank 1 update). In this session we consider another variant that is based on the BLAS operations GEMV (matrix vector product) and TRSV (triangular solver).

Note that the term variant is used here to express that both algorithms (and implementation) carry out the same underlying floating point operations. However, because of the different BLAS building blocks the operations are carried out in a different order. So it is possible that each variant of an algorithm achieves a different different performance. That is because:

Algorithm

Input:

An \(m \times n\) matrix \(A\).

Output:

  • \(A\) is overwritten with the LU factorization.

  • \(p\) a pivot vector of length \(k := \text{min}\{m, n\}\).

Return value:

  • \(-1\) on success and otherwise the index of the diagonal element that was exactly zero during the factorization

  • For \(j=0, \dots, k-1\)

    • Consider for the first \(j+1\) columns of \(A\) the partition

      \[\left(\begin{array}{ccc} a_{0,0} & \dots & a_{0,j} \\ \vdots & & \vdots \\ a_{m,0} & \dots & a_{m,j} \\ \end{array}\right)=\left(\begin{array}{c|c} A_{0,0} & \vec{a_{0,j}} \\ \hline A_{j,0} & \vec{a_{j,j}} \\ \end{array}\right)\]
    • TRSV operation

      Compute \(\vec{a_{0,j}} \leftarrow L_{\text{unit}}^{-1}(A_{0,0}) \vec{a_{0,j}}\) where \(L_{\text{unit}}( \cdot )\) denotes that a matrix argument is considered as a lower unit triangular matrix.

    • GEMV operation

      Compute \(\vec{a_{j,j}} \leftarrow \vec{a_{j,j}} - A_{j,0} \vec{a_{0,j}}\).

    • Partial pivoting

      Find pivot index \(p_j\) from \(\vec{a_{j,j}}\), i.e.

      \[p_j = j + \text{argmax}_{0 \leq \ell < m} a_{j,l}\]

      If \(p_j \neq j\) interchange in \(A\) rows with index \(j\) and \(p_j\).

    • Consider for \(\vec{a_{j,j}}\) the sub-partition

      \[\vec{a_{j,j}} =\left(\begin{array}{c} a_{j,j} \\ \hline \vec{a_{j+1,j}} \end{array}\right)\]
    • If \(a_{j,j}\) is exactly zero return \(j\)

    • Scale operation

      Compute \(\vec{a_{j+1,j}} \leftarrow \frac{1}{a_{j,j}} \vec{a_{j+1,j}}\)

  • For \(j=k, \dots, n-1\)

    • Consider for \(A\) the partition

      \[A = \left(\begin{array}{c|c|c|c} A_{0,0} & \vec{a_{0,k}} & \cdots & \vec{a_{0,n-1}} \\ \end{array}\right)\]
    • TRSV operation

      Compute \(\vec{a_{0,j}} \leftarrow L_{\text{unit}}^{-1}(A_{0,0}) \vec{a_{0,j}}\)

Exercise

Test and Benchmark