Unblocked LU factorization: Another variant

Content

In the lecture we introduced algorithms for an unblocked LU factorization by considering the following partitioning of matrix \(A\):

\[A =\left(\begin{array}{l|l|l} A_{0,0} & a_{0,j} & A_{0,j+1} \\ \hline a_{j,0}^T & a_{j,j} & a_{j,j+1}^T \\ \hline A_{j+1,0} & a_{j+1,j} & A_{j+1,j+1} \end{array}\right)\]

Variant 1

In session 13 we ended up with the following algorithm:

  • for \(j=0, \dots, \min\{m,n\}-1\)

    • Find pivot \(p_j\) in \(\left(\begin{array}{c} a_{j,j} \\ a_{j+1,j} \end{array}\right)\)

      • if \(j \neq p_j\) interchange in \(A\) row \(j\) and row \(p_j\)

    • if element \(A_{j,j}\) is exactly zero return \(j\) (matrix is numerically singular).

    • Scaling: \(a_{j+1,j} \leftarrow \frac{1}{a_{j,j}} a_{j+1,j}\)

    • Rank-1 update: \(A_{j+1,j+1} \leftarrow A_{j+1,j+1} - a_{j+1,j} a_{j,j+1}^T\)

This variant was based on a rank one update.

Variant 2

Another variant can be derived such that we get:

  • for \(j=0, \dots, \min\{m,n\}-1\)

    • Solve: \(a_{0,j} \leftarrow L_{\text{unit}}^{-1}(A_{0,0}) a_{0,j}\)

    • Matrix-vector product \( \left(\begin{array}{c} a_{j,j} \\ a_{j+1,j} \end{array}\right) \leftarrow \left(\begin{array}{c} a_{j,j} \\ a_{j+1,j} \end{array}\right) - \left(\begin{array}{c} a_{j,0}^T \\ A_{j+1,0} \end{array}\right) a_{0,j} \)

    • Find pivot \(p_j\) in \(\left(\begin{array}{c} a_{j,j} \\ a_{j+1,j} \end{array}\right)\)

      • if \(j \neq p_j\) interchange in \(A\) row \(j\) and row \(p_j\)

    • if element \(A_{j,j}\) is exactly zero return \(j\) (matrix is numerically singular).

    • Scaling: \(a_{j+1,j} \leftarrow \frac{1}{a_{j,j}} a_{j+1,j}\)

  • if \(\min\{m,n\} < n\)

    • Solve: \(A_{0,m} \leftarrow L_{\text{unit}}^{-1}(A_{0,0}) A_{0,m}\) where we consider the partitioning case: \( A = \left(\begin{array}{l|l} A_{0,0} & A_{0,m} \\ \end{array}\right)\)

This variant is based on a matrix vector product.

Notes on using the library

The library provides matrix and vector types. Matrix types where extended for triangular (actually trapezoidal) matrices with full storage. Currently we have

With memory allocation and deallocation

Writeable view

Read only view

general matrix

GeMatrix<T>

GeMatrixView<T>

GeMatrixConstView<T>

triangular/trapezoidal matrix

TrMatrixView<T>

TrMatrixConstView<T>

dense vector

DenseVector<T>

DenseVectorView<T>

DenseVectorConstView<T>

Creating triangular views from general matrices

Note that triangular matrices can only be created as views from a general matrix. For example

GeMatrix<double>     A(4,8);
// ...

TrMatrixView<double> L = A.view(UpLo::LowerUnit);
TrMatrixView<double> U = A.view(UpLo::Upper);       // upper, non-unit
// ...

or simply

GeMatrix<double>     A(4,8);
// ...

auto L = A.view(UpLo::LowerUnit);
auto U = A.view(UpLo::Upper);       // upper, non-unit
// ...

Using constView instead of view creates a TrMatrixConstView object.

Creating views referencing blocks and subvectors

Assume that A is some general or triangular matrix with full storage. For convenience we provide the following methods for creating matrix and vector views:

A.block(i,j)

Selecting a block starting at A(i,j) with the remaining rows (i.e. A.numRows()-i) and columns (i.e. `A.numCols()-j).

A.dim(m,n)

Selecting a block starting at A(0,0) with m rows and n columns.

A.col(i,j)

Selecting the column starting at A(i,j)

A.row(i,j)

Selecting the row starting at A(i,j)

For dense vectors we provide similar methods (and at least currently use the same names for the methods):

x.block(i)

Selecting a sub-vector starting at x(i)

x.dim(n)

Selecting a sub-vector starting at x(0) of length n.

Traits for getting the element type

You will see that in many cases the element type of matrices and vectors is defined through the function signature. But in same cases you still need traits like here:

template <typename VectorX,
          Require< Dense<VectorX> >=true >
void
foo(VectorX &&x)
{
    using T = ElementType<VectorX>;             // getting the element type

    // ...
}

Example for using the triangular solver

Solving \(Lx = b\) where \(L\) is a lower unit triangular matrix can be done with the provided sv function. Note that this function takes \(L\) and \(b\) and then overwrites \(b\) with the solution \(x\).

In case \(L\) is stored in the strict lower triangular part of \(A\) this can be implemented by

sv(A.view(UpLo::LowerUnit), b);

Of course you can create views of views. Like in

sv(A.dim(j,j).view(UpLo::LowerUnit), A.col(0,j).dim(j));

Figure out what happens here before you start with the exercise!

Exercise

With lu.hpp we provide an implementation of the first variant and a skeleton for the second variant:

#ifndef HPC_MATVEC_LU_HPP
#define HPC_MATVEC_LU_HPP

#include <cassert>
#include <cstddef>
#include <type_traits>

#include <hpc/matvec/iamax.hpp>
#include <hpc/matvec/rank1.hpp>
#include <hpc/matvec/scal.hpp>
#include <hpc/matvec/sv.hpp>
#include <hpc/matvec/sm.hpp>
#include <hpc/matvec/swap.hpp>

#include <hpc/matvec/densevector.hpp>
#include <hpc/matvec/traits.hpp>
#include <hpc/ulmblas/axpy.hpp>

//#include <hpc/mklblas/mv.hpp>
//#include <hpc/mklblas/sv.hpp>
//#include <hpc/mklblas/mm.hpp>
//#include <hpc/mklblas/sm.hpp>

namespace hpc { namespace matvec {

template <typename MatrixA, typename VectorPiv,
          Require< Ge<MatrixA>,
                   Dense<VectorPiv>
                 > = true>
std::ptrdiff_t
lu_unblk_var1(MatrixA &&A, VectorPiv &&piv)
{
    using T = ElementType<MatrixA>;

    std::size_t m  = A.numRows();
    std::size_t n  = A.numCols();
    std::size_t mn = std::min(m,n);

    assert(piv.length()>=mn);

    for (std::size_t j=0; j<mn; ++j) {
        // pivoting
        auto jp = j + iamax(A.col(j,j));
        if (jp!=j) {
            swap(A.row(j,0), A.row(jp,0));
        }
        piv(j) = jp;
        if (A(j,j)==T(0)) {
            return j;
        }

        // apply gauss
        scal(1/A(j,j), A.col(j+1,j));
        rank1(T(-1), A.col(j+1,j), A.row(j,j+1),
              A.block(j+1,j+1));
    }
    return -1;
}

template <typename MatrixA, typename VectorPiv,
          Require< Ge<MatrixA>,
                   Dense<VectorPiv>
                 > = true>
std::ptrdiff_t
lu_unblk_var2(MatrixA &&A, VectorPiv &&piv)
{
    // TODO: Your code
    return -1;
}


} } // namespace matvec, hpc

#endif // HPC_MATVEC_LU_HPP

Implement the second variant. Use hpc/test/lu_unblk.cpp for benchmarking and testing:

#include <complex>
#include <functional>
#include <random>

#include <printf.hpp>

#include <hpc/matvec.hpp>
#include <hpc/matvec/test/error.hpp>
#include <hpc/matvec/test/rand.hpp>
#include <hpc/matvec/test/walltime.hpp>

#include "lu.hpp"

namespace hpc { namespace matvec {

template <typename T, template<typename> class MatrixA,
          typename lu_func,
          Require< Ge<MatrixA<T>> > = true>
std::pair<double, double>
test_lu(const MatrixA<T> &A0, lu_func lu)
{
    GeMatrix<double>         A(A0.numRows(), A0.numCols(), Order::ColMajor);
    DenseVector<std::size_t> piv(A.numRows());

    copy(A0, A);
    std::ptrdiff_t res;

    test::WallTime<double> timer;

    timer.tic();
    res = lu(A, piv);
    double time = timer.toc();
    double err  = test::error_estimate::getrf(A0, A, piv);

    if (res!=-1) {
        fmt::printf("Matrix is (numerically) singular\n");
    }
    return std::pair<double, double>(err, time);
}

} } // namespace matvec, hpc

#define MIN_M 500
#define MIN_N 2
#define INC_M 0
#define INC_N 1
#define MAX_M 500
#define MAX_N 500

int
main()
{
    using namespace hpc::matvec;

    GeMatrix<double> A(MAX_M, MAX_N);

    test::rand(A);

    auto lu1 = lu_unblk_var1<GeMatrix<double> &, DenseVector<std::size_t> &>;
    auto lu2 = lu_unblk_var2<GeMatrix<double> &, DenseVector<std::size_t> &>;

    fmt::printf("%5s %5s "
                "%10s %10s %10s "
                "%10s %10s %10s\n",
                "M", "N",
                "Error 1", "Time 1", "MFLOPS 1",
                "Error 2", "Time 2", "MFLOPS 2");


    for (std::size_t m=MIN_M, n=MIN_N;
         m<=MAX_M && n<=MAX_N;
         m+=INC_M, n+=INC_N)
    {
        double maxMN = std::max(m,n);
        double minMN = std::min(m,n);
        double flops = maxMN*minMN*minMN
                     - ((minMN*minMN*minMN) / 3.0)
                     - (minMN*minMN) / 2.0;
        flops /= 1000000.0;

        auto A0   = A.dim(m, n);
        auto tst1 = test_lu(A0, lu1);
        auto tst2 = test_lu(A0, lu2);

        fmt::printf("%5d %5d "
                    "%10.2e %10.2f %10.2f "
                    "%10.2e %10.2f %10.2f\n",
                    m, n,
                    tst1.first, tst1.second, flops/tst1.second,
                    tst2.first, tst2.second, flops/tst2.second);
    }
}

Note that by default the benchmark tests matrices with a fixed number of rows and an increasing number of columns. These kind of matrix shapes are relevant when the unblocked LU factorization is used for the blocked factorization.

Compile with

heim$ g++-7.2 -Wall -std=c++17 -I /home/numerik/pub/hpc/ws18/session20/ -O3 -mavx -o test_lu_unblk test_lu_unblk.cpp
heim$