# Using MKL BLAS for the LU factorization

#### Content

We hope that the previous benchmarks for the LU factorization were so poor because our implementation of the triangular solver (i.e. sm) was not optimized. In order to confirm our hypothesis we use some BLAS functions from MKL.

Based on this we later can figure out which of our own BLAS functions need further optimization and to what extent.

## HPC library requirements for this session

We use the library from /home/numerik/pub/hpc/ws18/session21 as starting point for this session.

Copy files lu.hpp, test_lu_blk.cpp and Makefile into a local directory. For example:

heim$rm -rf ex01 heim$ 
heim$mkdir ex01 heim$ cp /home/numerik/pub/hpc/ws18/session21/lib/test/test_lu_blk.cpp ex01/
heim$cp /home/numerik/pub/hpc/ws18/session21/lib/test/Makefile ex01/ heim$ cp /home/numerik/pub/hpc/ws18/session21/lib/hpc/matvec/lu.hpp ex01/
heim$ls ex01 Makefile lu.hpp test_lu_blk.cpp heim$ 

## Using the MKL-BLAS functions

We provide in hpc/mklblas interfaces for the following BLAS functions:

• mklblas::mv for the matrix-vector product

• mklblas::sv for the triangular solver with a single right-hand side (i.e.

for solving $$Ax=b$$ where $$A$$ is triangular and $$x$$ and $$b$$ are vectors).

• mklblas::mm for the matrix-matrix product

• mklblas::sm for the triangular solver of matrix equations (i.e. for

$$AX=B$$ where $$A$$ is triangular and $$X$$ and $$B$$ are matrices).

The signatures of all these functions are consistent with our self-written functions. That means we simply write mklblas::mm(...) instead of mm(...) for using MKL-BLAS instead of ulmBLAS.

## Exercise

• Modify in lu.hpp (in your local directory) the implementation such that it uses MKL instead of ulmBLAS wherever possible

• MKL provides not only BLAS functions but also its own LU factorization. Compare our LU factorization (using MKL-BLAS) against the LU factorization from MKL.

For this purpose we provide in hpc/test a benchmark program that already works (but still uses ulmBLAS). Just use the therein contained makefile:

heim$make clean rm -f test_lu_blk test_lu_blk.o rm -f core heim$ make
g++-7.2 -std=c++17 -Wall -O3 -mavx -m64 -g -I/home/numerik/pub/hpc/ws18/session21/lib -I/opt/intel/compilers_and_libraries/linux/mkl/include -DMKL_ILP64 -L/opt/intel/compilers_and_libraries/linux/lib/intel64 -L/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64 -Wl,-rpath -Wl,/opt/intel/compilers_and_libraries/linux/lib/intel64 -Wl,-rpath -Wl,/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64  test_lu_blk.cpp  -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lm -lpthread -o test_lu_blk
heim$./test_lu_blk M N Error 1 MKL (Time 1) MFLOPS 1 Error 2 ULM (Time 2) MFLOPS 2 Ratio T1/T2*100 10 10 4.98e-02 0.00 0.45 4.98e-02 0.00 140.15 30821.43 20 20 2.49e-02 0.00 6.66 2.49e-02 0.00 432.39 6488.03 30 30 1.66e-02 0.00 286.21 3.29e-02 0.00 913.82 319.29 40 40 1.25e-02 0.00 1093.92 2.47e-02 0.00 1262.07 115.37 50 50 1.01e-02 0.00 1703.36 1.97e-02 0.00 1506.84 88.46 60 60 8.42e-03 0.00 2453.12 1.65e-02 0.00 1670.29 68.09 70 70 7.27e-03 0.00 3102.56 1.42e-02 0.00 571.57 18.42 80 80 6.37e-03 0.00 4075.37 1.24e-02 0.00 746.35 18.31 90 90 5.67e-03 0.00 3692.25 1.10e-02 0.00 965.55 26.15 100 100 1.45e-02 0.00 5615.91 9.91e-03 0.00 1196.38 21.30 110 110 1.31e-02 0.00 5460.28 9.02e-03 0.00 1407.87 25.78 120 120 1.20e-02 0.00 7050.30 8.26e-03 0.00 1571.31 22.29 130 130 1.10e-02 0.00 7344.47 1.10e-02 0.00 1349.57 18.38 140 140 1.02e-02 0.00 8098.55 1.02e-02 0.00 1562.87 19.30 150 150 9.55e-03 0.00 8496.04 9.54e-03 0.00 1720.74 20.25 160 160 8.95e-03 0.00 9517.21 8.92e-03 0.00 1953.02 20.52 170 170 8.42e-03 0.00 9927.95 8.38e-03 0.00 2198.47 22.14 180 180 7.95e-03 0.00 10737.70 7.90e-03 0.00 2407.45 22.42 190 190 7.42e-03 0.00 11331.98 9.68e-03 0.00 2713.01 23.94 200 200 7.05e-03 0.00 12227.46 9.19e-03 0.00 2415.21 19.75 210 210 6.72e-03 0.00 12476.90 8.74e-03 0.00 2601.43 20.85 220 220 6.41e-03 0.00 13377.73 8.32e-03 0.00 2851.40 21.31 230 230 6.14e-03 0.00 14186.77 7.96e-03 0.00 3152.10 22.22 240 240 5.90e-03 0.00 14869.15 7.61e-03 0.00 3347.61 22.51 250 250 5.66e-03 0.00 15099.93 7.30e-03 0.00 3547.16 23.49 260 260 1.36e-02 0.00 14581.32 7.01e-03 0.00 3192.15 21.89 270 270 5.23e-03 0.00 15246.67 6.74e-03 0.00 3383.52 22.19 280 280 5.05e-03 0.00 15753.52 6.48e-03 0.00 3521.66 22.35 290 290 4.86e-03 0.00 16015.47 6.25e-03 0.00 3728.99 23.28 300 300 4.71e-03 0.00 16251.94 6.03e-03 0.00 3920.19 24.12 310 310 4.55e-03 0.00 16444.72 5.83e-03 0.00 4090.02 24.87 320 320 4.41e-03 0.00 16569.87 5.64e-03 0.01 4307.26 25.99 330 330 4.28e-03 0.00 17138.07 5.45e-03 0.01 3915.17 22.84 340 340 4.15e-03 0.00 17204.70 5.28e-03 0.01 4099.04 23.83 350 350 4.03e-03 0.00 17315.86 5.12e-03 0.01 4240.12 24.49 360 360 3.92e-03 0.00 18309.95 4.97e-03 0.01 3872.52 21.15 370 370 3.80e-03 0.00 17871.13 4.84e-03 0.01 4015.67 22.47 380 380 3.71e-03 0.00 18371.37 4.70e-03 0.01 4213.92 22.94 390 390 3.61e-03 0.00 18242.49 4.57e-03 0.01 3899.84 21.38 400 400 3.52e-03 0.00 18947.24 4.45e-03 0.01 4076.12 21.51 410 410 3.43e-03 0.00 17636.56 4.34e-03 0.01 4532.34 25.70 420 420 3.35e-03 0.00 18986.09 4.24e-03 0.01 4326.21 22.79 430 430 3.27e-03 0.00 17966.02 4.13e-03 0.01 4533.37 25.23 440 440 3.20e-03 0.00 19900.82 4.03e-03 0.01 4646.43 23.35 450 450 3.13e-03 0.00 18578.64 3.94e-03 0.01 4389.49 23.63 460 460 3.07e-03 0.00 19960.95 3.85e-03 0.01 4527.57 22.68 470 470 3.01e-03 0.00 18805.02 3.76e-03 0.01 4638.55 24.67 480 480 2.95e-03 0.00 20400.03 3.68e-03 0.02 4866.16 23.85 490 490 2.90e-03 0.00 19051.34 3.59e-03 0.02 4921.62 25.83 500 500 2.84e-03 0.00 20317.22 3.52e-03 0.02 5181.16 25.50 510 510 2.78e-03 0.00 18897.37 3.44e-03 0.02 5277.46 27.93 520 520 2.73e-03 0.00 20143.77 3.37e-03 0.02 4914.66 24.40 530 530 2.68e-03 0.00 20039.90 3.31e-03 0.02 5123.43 25.57 540 540 2.63e-03 0.01 20032.25 3.25e-03 0.02 5212.66 26.02 550 550 2.58e-03 0.01 20231.66 3.18e-03 0.02 5369.01 26.54 560 560 2.54e-03 0.01 20997.02 3.12e-03 0.02 5513.49 26.26 570 570 2.49e-03 0.01 20470.64 3.06e-03 0.02 5431.58 26.53 580 580 2.44e-03 0.01 20924.60 3.01e-03 0.02 5305.82 25.36 590 590 2.41e-03 0.01 20712.72 2.95e-03 0.03 5380.54 25.98 600 600 2.37e-03 0.01 20777.24 2.90e-03 0.03 5480.61 26.38 610 610 2.33e-03 0.01 20773.60 2.85e-03 0.03 5550.32 26.72 620 620 2.29e-03 0.01 21348.38 2.80e-03 0.03 5691.80 26.66 630 630 2.26e-03 0.01 20737.03 2.75e-03 0.03 5755.92 27.76 640 640 2.23e-03 0.01 20905.47 2.70e-03 0.03 6035.20 28.87 650 650 2.21e-03 0.01 20982.13 2.66e-03 0.03 5510.40 26.26 660 660 2.17e-03 0.01 20994.36 2.61e-03 0.03 5712.66 27.21 670 670 2.14e-03 0.01 21125.77 2.57e-03 0.03 5752.91 27.23 680 680 2.11e-03 0.01 21686.22 2.53e-03 0.04 5815.78 26.82 690 690 2.08e-03 0.01 21035.37 2.49e-03 0.04 6036.86 28.70 700 700 2.05e-03 0.01 21716.30 2.45e-03 0.04 6256.06 28.81 710 710 2.03e-03 0.01 21262.20 2.42e-03 0.04 5711.21 26.86 720 720 2.18e-03 0.01 21072.35 2.38e-03 0.04 5843.55 27.73 730 730 2.15e-03 0.01 21441.31 2.35e-03 0.04 5996.51 27.97 740 740 2.11e-03 0.01 21230.98 2.31e-03 0.04 6109.28 28.78 750 750 2.08e-03 0.01 21469.41 2.28e-03 0.05 6113.50 28.48 760 760 2.05e-03 0.01 21483.78 2.24e-03 0.05 6200.96 28.86 770 770 2.47e-03 0.01 21542.33 2.21e-03 0.05 5848.71 27.15 780 780 1.99e-03 0.02 20874.62 2.18e-03 0.05 6050.32 28.98 790 790 1.96e-03 0.02 21646.46 2.15e-03 0.05 6086.87 28.12 800 800 1.94e-03 0.02 21616.29 2.12e-03 0.05 6200.59 28.68 810 810 1.91e-03 0.02 21850.28 3.67e-03 0.06 6274.75 28.72 820 820 1.89e-03 0.02 21161.17 3.61e-03 0.06 6421.13 30.34 830 830 1.86e-03 0.02 21808.18 3.56e-03 0.06 6426.87 29.47 840 840 1.84e-03 0.02 21797.43 3.51e-03 0.06 6079.92 27.89 850 850 1.83e-03 0.02 21963.13 3.47e-03 0.07 6167.14 28.08 860 860 1.81e-03 0.02 21795.61 3.42e-03 0.07 6344.46 29.11 870 870 1.79e-03 0.02 22170.41 3.38e-03 0.07 6369.84 28.73 880 880 1.76e-03 0.02 22096.84 3.33e-03 0.07 6468.27 29.27 890 890 1.74e-03 0.02 22028.85 3.29e-03 0.07 6586.14 29.90 900 900 1.72e-03 0.02 21888.17 3.24e-03 0.08 6354.02 29.03 910 910 1.70e-03 0.02 21757.11 3.20e-03 0.08 6275.55 28.84 920 920 1.68e-03 0.02 22257.05 3.16e-03 0.08 6421.53 28.85 930 930 1.66e-03 0.02 22318.38 3.12e-03 0.08 6479.19 29.03 940 940 1.65e-03 0.03 22078.68 3.08e-03 0.08 6649.70 30.12 950 950 1.63e-03 0.03 22146.29 3.04e-03 0.09 6649.63 30.03 960 960 1.61e-03 0.03 22160.27 3.00e-03 0.09 6814.72 30.75 970 970 1.59e-03 0.03 22548.50 2.96e-03 0.09 6443.70 28.58 980 980 1.58e-03 0.03 22326.28 2.93e-03 0.09 6620.16 29.65 990 990 1.56e-03 0.03 22536.76 2.90e-03 0.10 6588.17 29.23 1000 1000 1.54e-03 0.03 22374.93 2.86e-03 0.10 6661.60 29.77 1010 1010 1.53e-03 0.03 22528.63 2.83e-03 0.10 6714.60 29.80 1020 1020 1.51e-03 0.03 22279.68 2.80e-03 0.10 6886.15 30.91 1030 1030 4.59e-03 0.03 21958.44 2.76e-03 0.11 6421.78 29.25 1040 1040 1.48e-03 0.03 21836.91 2.73e-03 0.11 6607.24 30.26 1050 1050 1.47e-03 0.03 22328.60 2.70e-03 0.12 6699.48 30.00 1060 1060 1.45e-03 0.04 22220.09 2.67e-03 0.12 6856.50 30.86 1070 1070 1.44e-03 0.04 22434.79 2.64e-03 0.12 6777.02 30.21 1080 1080 1.42e-03 0.04 22290.85 2.61e-03 0.12 6911.83 31.01 1090 1090 1.41e-03 0.04 22512.44 2.58e-03 0.13 6553.16 29.11 1100 1100 1.39e-03 0.04 22400.54 2.55e-03 0.13 6788.23 30.30 1110 1110 1.38e-03 0.04 22322.49 2.52e-03 0.13 6783.44 30.39 1120 1120 1.37e-03 0.04 22429.97 2.49e-03 0.14 6879.08 30.67 1130 1130 1.35e-03 0.04 22124.37 2.46e-03 0.14 6914.50 31.25 1140 1140 1.34e-03 0.04 22424.09 2.43e-03 0.14 7056.51 31.47 1150 1150 1.33e-03 0.05 22128.67 2.41e-03 0.15 6979.42 31.54 1160 1160 1.32e-03 0.05 22407.17 2.39e-03 0.15 6760.18 30.17 1170 1170 1.30e-03 0.05 22222.17 2.36e-03 0.16 6777.41 30.50 1180 1180 1.29e-03 0.05 22588.56 2.34e-03 0.16 7015.83 31.06 1190 1190 1.28e-03 0.05 22649.24 2.32e-03 0.16 7002.08 30.92 1200 1200 1.27e-03 0.05 22699.55 2.30e-03 0.16 7047.86 31.05 1210 1210 1.26e-03 0.05 22866.10 2.27e-03 0.17 7129.04 31.18 1220 1220 1.25e-03 0.05 22737.81 2.25e-03 0.17 6954.63 30.59 1230 1230 1.24e-03 0.05 22885.10 2.23e-03 0.18 6853.79 29.95 1240 1240 1.23e-03 0.06 22699.75 2.20e-03 0.18 6962.43 30.67 1250 1250 1.22e-03 0.06 22952.78 2.18e-03 0.19 6978.91 30.41 1260 1260 1.21e-03 0.06 22237.87 2.16e-03 0.19 7203.31 32.39 1270 1270 1.20e-03 0.06 22956.14 2.14e-03 0.19 7172.54 31.24 1280 1280 1.19e-03 0.06 22340.35 2.12e-03 0.19 7251.18 32.46 1290 1290 1.18e-03 0.06 22910.99 2.10e-03 0.21 6959.80 30.38 1300 1300 1.17e-03 0.06 22780.45 2.08e-03 0.20 7141.46 31.35 1310 1310 1.16e-03 0.07 22690.85 2.06e-03 0.21 7070.77 31.16 1320 1320 1.15e-03 0.07 22898.34 2.04e-03 0.21 7165.26 31.29 1330 1330 1.14e-03 0.07 23019.41 2.02e-03 0.22 7164.10 31.12 1340 1340 1.13e-03 0.07 22697.39 2.00e-03 0.22 7381.44 32.52 1350 1350 1.12e-03 0.07 23109.40 1.98e-03 0.23 7067.29 30.58 1360 1360 1.11e-03 0.07 22808.11 1.96e-03 0.24 7084.39 31.06 1370 1370 1.10e-03 0.07 23168.73 1.95e-03 0.24 7141.59 30.82 1380 1380 1.09e-03 0.08 22976.34 1.93e-03 0.24 7330.02 31.90 1390 1390 1.09e-03 0.08 23205.27 1.91e-03 0.25 7285.97 31.40 1400 1400 1.08e-03 0.08 22981.20 1.89e-03 0.25 7331.34 31.90 1410 1410 1.07e-03 0.08 23245.48 1.88e-03 0.27 6989.07 30.07 1420 1420 1.06e-03 0.08 23097.30 1.86e-03 0.26 7283.75 31.54 1430 1430 1.05e-03 0.08 23296.32 1.84e-03 0.27 7217.83 30.98 1440 1440 1.04e-03 0.09 23135.12 1.83e-03 0.27 7315.23 31.62 1450 1450 1.04e-03 0.09 23371.27 1.81e-03 0.28 7326.49 31.35 1460 1460 1.03e-03 0.09 23113.46 1.80e-03 0.28 7489.57 32.40 1470 1470 1.02e-03 0.09 23424.20 1.78e-03 0.29 7403.03 31.60 1480 1480 1.01e-03 0.09 23277.84 1.76e-03 0.30 7180.53 30.85 1490 1490 1.01e-03 0.10 22940.15 1.75e-03 0.31 7136.97 31.11 1500 1500 1.00e-03 0.10 23270.43 1.74e-03 0.30 7454.82 32.04 1510 1510 9.96e-04 0.10 23506.08 1.72e-03 0.31 7371.84 31.36 1520 1520 9.89e-04 0.10 23354.40 1.71e-03 0.31 7429.02 31.81 1530 1530 9.81e-04 0.10 23448.43 1.69e-03 0.32 7430.68 31.69 1540 1540 1.41e-03 0.11 23007.06 1.68e-03 0.34 7220.78 31.39 1550 1550 1.05e-03 0.11 23398.06 1.66e-03 0.34 7224.99 30.88 1560 1560 1.04e-03 0.11 22697.94 1.65e-03 0.34 7356.43 32.41 1570 1570 1.04e-03 0.11 23498.16 1.64e-03 0.35 7379.64 31.41 1580 1580 1.03e-03 0.11 23248.78 1.62e-03 0.35 7578.12 32.60 1590 1590 1.02e-03 0.11 23522.04 1.61e-03 0.36 7532.45 32.02 1600 1600 1.02e-03 0.12 22913.38 1.60e-03 0.36 7638.29 33.34 1610 1610 1.01e-03 0.12 23584.08 1.58e-03 0.38 7293.22 30.92 1620 1620 1.00e-03 0.12 23387.95 1.57e-03 0.38 7445.84 31.84 1630 1630 9.97e-04 0.12 23580.80 1.56e-03 0.39 7399.37 31.38 1640 1640 9.90e-04 0.13 22951.65 1.55e-03 0.39 7469.35 32.54 1650 1650 9.83e-04 0.13 23625.90 1.53e-03 0.40 7550.39 31.96 1660 1660 9.78e-04 0.13 23388.92 1.52e-03 0.39 7728.59 33.04 1670 1670 9.74e-04 0.13 23712.11 1.51e-03 0.42 7381.66 31.13 1680 1680 9.67e-04 0.14 22958.66 1.50e-03 0.43 7420.17 32.32 1690 1690 9.60e-04 0.14 23729.48 1.49e-03 0.43 7493.12 31.58 1700 1700 9.54e-04 0.14 23548.74 1.48e-03 0.43 7648.24 32.48 1710 1710 9.50e-04 0.14 23715.34 1.47e-03 0.44 7499.20 31.62 1720 1720 9.44e-04 0.14 23531.33 1.46e-03 0.44 7629.56 32.42 1730 1730 9.40e-04 0.15 23767.68 1.45e-03 0.47 7342.72 30.89 1740 1740 9.36e-04 0.15 23499.06 1.44e-03 0.47 7497.03 31.90 1750 1750 9.30e-04 0.15 23761.66 1.70e-03 0.47 7519.82 31.65 1760 1760 9.24e-04 0.15 23591.76 1.69e-03 0.48 7607.43 32.25 1770 1770 9.20e-04 0.16 23323.38 1.67e-03 0.49 7606.04 32.61 1780 1780 9.14e-04 0.16 23661.86 1.66e-03 0.49 7739.53 32.71 1790 1790 9.09e-04 0.16 23796.57 1.65e-03 0.50 7607.19 31.97 1800 1800 9.03e-04 0.16 23680.20 1.64e-03 0.52 7475.86 31.57 1810 1810 8.97e-04 0.17 23882.60 1.62e-03 0.53 7473.13 31.29 1820 1820 8.92e-04 0.17 23096.76 1.61e-03 0.52 7664.43 33.18 1830 1830 8.88e-04 0.17 23884.92 1.60e-03 0.54 7622.84 31.91 1840 1840 8.84e-04 0.17 23764.65 1.59e-03 0.54 7688.67 32.35 1850 1850 8.78e-04 0.18 23898.15 1.58e-03 0.55 7712.88 32.27 1860 1860 8.73e-04 0.18 23819.65 1.57e-03 0.56 7616.66 31.98 1870 1870 8.68e-04 0.18 23611.23 1.56e-03 0.58 7525.30 31.87 1880 1880 8.63e-04 0.19 23688.01 1.55e-03 0.58 7600.23 32.08 1890 1890 8.58e-04 0.19 23933.25 1.54e-03 0.59 7574.61 31.65 1900 1900 8.56e-04 0.19 23486.95 1.53e-03 0.58 7832.02 33.35 1910 1910 8.53e-04 0.19 23976.65 1.52e-03 0.60 7753.44 32.34 1920 1920 8.49e-04 0.20 23663.22 1.51e-03 0.60 7868.23 33.25 1930 1930 8.44e-04 0.20 23610.85 1.50e-03 0.63 7616.40 32.26 1940 1940 8.38e-04 0.20 23877.17 1.48e-03 0.63 7721.92 32.34 1950 1950 8.34e-04 0.21 24057.06 1.47e-03 0.65 7643.83 31.77 1960 1960 8.30e-04 0.21 23773.88 1.46e-03 0.65 7722.55 32.48 1970 1970 8.26e-04 0.21 24064.04 1.45e-03 0.66 7712.25 32.05 1980 1980 8.22e-04 0.22 23865.79 1.44e-03 0.65 7920.03 33.19 1990 1990 8.18e-04 0.22 23801.86 1.44e-03 0.69 7593.11 31.90 2000 2000 8.14e-04 0.22 23926.54 1.43e-03 0.70 7649.66 31.97 heim$ 

Note that you have to compile and run the benchmarks on the E44 computers!