We hope that the previous benchmarks for the LU factorization were so poor because our implementation of the triangular solver (i.e. sm) was not optimized. In order to confirm our hypothesis we use some BLAS functions from MKL.

Based on this we later can figure out which of our own BLAS functions need further optimization and to what extent.

## HPC library requirements for this session

We use the library from /home/numerik/pub/hpc/ws18/session21 as starting point for this session.

Copy files lu.hpp, test_lu_blk.cpp and Makefile into a local directory. For example:

## Using the MKL-BLAS functions

We provide in hpc/mklblas interfaces for the following BLAS functions:

• mklblas::mv for the matrix-vector product

• mklblas::sv for the triangular solver with a single right-hand side (i.e.

for solving $$Ax=b$$ where $$A$$ is triangular and $$x$$ and $$b$$ are vectors).

• mklblas::mm for the matrix-matrix product

• mklblas::sm for the triangular solver of matrix equations (i.e. for

$$AX=B$$ where $$A$$ is triangular and $$X$$ and $$B$$ are matrices).

The signatures of all these functions are consistent with our self-written functions. That means we simply write mklblas::mm(...) instead of mm(...) for using MKL-BLAS instead of ulmBLAS.

## Exercise

• Modify in lu.hpp (in your local directory) the implementation such that it uses MKL instead of ulmBLAS wherever possible

• MKL provides not only BLAS functions but also its own LU factorization. Compare our LU factorization (using MKL-BLAS) against the LU factorization from MKL.

For this purpose we provide in hpc/test a benchmark program that already works (but still uses ulmBLAS). Just use the therein contained makefile:

heim$./test_lu_blk M N Error 1 MKL (Time 1) MFLOPS 1 Error 2 ULM (Time 2) MFLOPS 2 Ratio T1/T2*100 10 10 4.98e-02 0.00 0.45 4.98e-02 0.00 140.15 30821.43 20 20 2.49e-02 0.00 6.66 2.49e-02 0.00 432.39 6488.03 30 30 1.66e-02 0.00 286.21 3.29e-02 0.00 913.82 319.29 40 40 1.25e-02 0.00 1093.92 2.47e-02 0.00 1262.07 115.37 50 50 1.01e-02 0.00 1703.36 1.97e-02 0.00 1506.84 88.46
Note that you have to compile and run the benchmarks on the E44 computers!