Matrix-Matrix Product Experiments with uBLAS

Session 1

Pure C++ Implementation

Session 2

Some optimizations

Session 3

Using OpenMP

Session 4

Taking advantage of uBLAS

Session 5

Using GCC Vector-Extensions for Micro-Kernels

Session 6

Notes on the GEMM Algorithm:

  • Developing a micro-kernel

Session 7

Application for a fast Matrix-Matrix Product: LU-Factorization

Session 8

There is still some work to do: Comparison with Intel MKL and Eigen

The implementation of the GEMM algorithm is based on BLIS: A Framework for Rapidly Instantiating BLAS Functionality and adopted from ulmBLAS.

The tar-ball test_ublas.tgz contains the files:

$shell> tar cfz test_ublas.tgz session*/*.hpp session*/*.cc session*/plot*
$shell> tar tfvz test_ublas.tgz
-rw-r--r-- lehn/num       7469 2016-02-02 19:52 session1/gemm.hpp
-rw-rw-r-- lehn/num      17181 2016-02-02 18:17 session2/avx.hpp
-rw-rw-r-- lehn/num      33544 2016-02-02 19:53 session2/fma.hpp
-rw-rw-r-- lehn/num       8714 2016-02-02 19:51 session2/gemm.hpp
-rw-rw-r-- lehn/num      17181 2016-02-02 18:14 session3/avx.hpp
-rw-rw-r-- lehn/num      33544 2016-02-02 19:53 session3/fma.hpp
-rw-rw-r-- lehn/num       8794 2016-02-02 19:51 session3/gemm.hpp
-rw-rw-r-- lehn/num      17181 2016-02-02 18:17 session4/avx.hpp
-rw-rw-r-- lehn/num      33544 2016-02-02 19:53 session4/fma.hpp
-rw-rw-r-- lehn/num       9797 2016-02-02 19:50 session4/gemm.hpp
-rw-rw-r-- lehn/num      17181 2016-02-02 18:17 session5/avx.hpp
-rw-rw-r-- lehn/num      33544 2016-02-02 19:53 session5/fma.hpp
-rw-rw-r-- lehn/num       1898 2016-02-02 17:11 session5/gccvec.hpp
-rw-rw-r-- lehn/num       3353 2016-02-02 01:34 session5/gccvec2.hpp
-rw-rw-r-- lehn/num      10291 2016-02-02 19:50 session5/gemm.hpp
-rw-rw-r-- lehn/num      13537 2016-02-02 16:05 session7/avx.hpp
-rw-rw-r-- lehn/num      24390 2016-02-02 16:05 session7/fma.hpp
-rw-rw-r-- lehn/num       1537 2016-02-12 21:01 session7/gccvec.hpp
-rw-rw-r-- lehn/num       3353 2016-02-02 16:05 session7/gccvec2.hpp
-rw-rw-r-- lehn/num      16946 2016-02-12 21:27 session7/gemm.hpp
-rw-rw-r-- lehn/num       6221 2016-02-13 09:20 session7/lu.hpp
-rw-rw-r-- lehn/num      13537 2016-02-13 12:55 session8/avx.hpp
-rw-rw-r-- lehn/num      24390 2016-02-13 12:55 session8/fma.hpp
-rw-rw-r-- lehn/num       1537 2016-02-13 12:55 session8/gccvec.hpp
-rw-rw-r-- lehn/num       3353 2016-02-13 12:55 session8/gccvec2.hpp
-rw-rw-r-- lehn/num      16946 2016-02-13 12:55 session8/gemm.hpp
-rw-rw-r-- lehn/num       6756 2016-02-14 02:51 session8/lu.hpp
-rw-r--r-- lehn/num       5159 2016-02-02 18:36 session1/matprod.cc
-rw-rw-r-- lehn/num       5158 2016-02-02 18:35 session2/matprod.cc
-rw-rw-r-- lehn/num       5179 2016-02-02 18:16 session3/matprod.cc
-rw-rw-r-- lehn/num       5401 2016-01-27 00:35 session4/matprod.cc
-rw-rw-r-- lehn/num       6913 2016-01-27 00:53 session4/symatprod.cc
-rw-rw-r-- lehn/num       5356 2016-01-31 11:28 session5/matprod.cc
-rw-rw-r-- lehn/num       4301 2016-02-11 16:34 session7/bench_lu.cc
-rw-rw-r-- lehn/num       5833 2016-02-14 10:37 session8/bench2_lu.cc
-rw-rw-r-- lehn/num       5833 2016-02-14 20:07 session8/bench_lu.cc
-rw-rw-r-- lehn/num       5001 2016-02-13 14:53 session8/bench_mkl_lu.cc
-rw-rw-r-- lehn/num        381 2016-01-22 14:31 session1/plot.session1.mflops
-rw-rw-r-- lehn/num        377 2016-01-22 14:32 session1/plot.session1.time
-rw-rw-r-- lehn/num        397 2016-01-22 14:32 session1/plot.session1.time_log
-rw-rw-r-- lehn/num        496 2016-01-23 00:34 session2/plot.session2.mflops
-rw-rw-r-- lehn/num        492 2016-01-23 00:35 session2/plot.session2.time
-rw-rw-r-- lehn/num        512 2016-01-23 00:35 session2/plot.session2.time_log
-rw-rw-r-- lehn/num        608 2016-01-23 11:03 session3/plot.session3.mflops
-rw-rw-r-- lehn/num        604 2016-01-23 11:03 session3/plot.session3.time
-rw-rw-r-- lehn/num        624 2016-01-23 11:03 session3/plot.session3.time_log
-rw-rw-r-- lehn/num        732 2016-01-27 00:42 session4/plot.session4.gemm.mflops
-rw-rw-r-- lehn/num        728 2016-01-27 00:42 session4/plot.session4.gemm.time
-rw-rw-r-- lehn/num        748 2016-01-27 00:42 session4/plot.session4.gemm.time_log
-rw-rw-r-- lehn/num        777 2016-01-27 01:45 session4/plot.session4.symm.mflops
-rw-rw-r-- lehn/num        773 2016-01-27 01:44 session4/plot.session4.symm.time
-rw-rw-r-- lehn/num        793 2016-01-27 01:44 session4/plot.session4.symm.time_log
-rw-rw-r-- lehn/num        949 2016-02-01 00:29 session5/plot.mt.session5.mflops
-rw-rw-r-- lehn/num        945 2016-02-01 00:30 session5/plot.mt.session5.time
-rw-rw-r-- lehn/num        965 2016-02-01 00:30 session5/plot.mt.session5.time_log
-rw-rw-r-- lehn/num        608 2016-02-01 00:31 session5/plot.session5.mflops
-rw-rw-r-- lehn/num        604 2016-02-01 00:31 session5/plot.session5.time
-rw-rw-r-- lehn/num        624 2016-02-01 00:46 session5/plot.session5.time_log
-rw-rw-r-- lehn/num       1007 2016-02-11 19:44 session7/plot.session7-mt.lu
-rw-rw-r-- lehn/num       1029 2016-02-11 19:45 session7/plot.session7-mt.lu.log
-rw-rw-r-- lehn/num       1021 2016-02-12 23:56 session7/plot.session7-mt.lu.mflops
-rw-rw-r-- lehn/num        577 2016-02-11 19:41 session7/plot.session7.lu
-rw-rw-r-- lehn/num        597 2016-02-11 19:42 session7/plot.session7.lu.log
-rw-rw-r-- lehn/num        589 2016-02-11 19:43 session7/plot.session7.lu.mflops
-rw-rw-r-- lehn/num       1007 2016-02-11 20:54 session8/plot.session7-mt.lu
-rw-rw-r-- lehn/num       1029 2016-02-11 20:54 session8/plot.session7-mt.lu.log
-rw-rw-r-- lehn/num       1021 2016-02-11 20:54 session8/plot.session7-mt.lu.mflops
-rw-rw-r-- lehn/num        601 2016-02-14 10:39 session8/plot.session8.lu
-rw-rw-r-- lehn/num        622 2016-02-14 10:39 session8/plot.session8.lu.log
-rw-rw-r-- lehn/num        613 2016-02-14 10:40 session8/plot.session8.lu.mflops
-rw-rw-r-- lehn/num        547 2016-02-14 09:21 session8/plot.session8.lu.mflops-log
$shell>