Matrix-Matrix Product Experiments with uBLAS
Pure C++ Implementation |
|
Some optimizations |
|
Using OpenMP |
|
Taking advantage of uBLAS |
|
Using GCC Vector-Extensions for Micro-Kernels |
|
Notes on the GEMM Algorithm:
|
|
Application for a fast Matrix-Matrix Product: LU-Factorization |
|
There is still some work to do: Comparison with Intel MKL and Eigen |
The implementation of the GEMM algorithm is based on BLIS: A Framework for Rapidly Instantiating BLAS Functionality and adopted from ulmBLAS.
The tar-ball test_ublas.tgz contains the files:
$shell> tar cfz test_ublas.tgz session*/*.hpp session*/*.cc session*/plot* $shell> tar tfvz test_ublas.tgz -rw-r--r-- lehn/num 7469 2016-02-02 19:52 session1/gemm.hpp -rw-rw-r-- lehn/num 17181 2016-02-02 18:17 session2/avx.hpp -rw-rw-r-- lehn/num 33544 2016-02-02 19:53 session2/fma.hpp -rw-rw-r-- lehn/num 8714 2016-02-02 19:51 session2/gemm.hpp -rw-rw-r-- lehn/num 17181 2016-02-02 18:14 session3/avx.hpp -rw-rw-r-- lehn/num 33544 2016-02-02 19:53 session3/fma.hpp -rw-rw-r-- lehn/num 8794 2016-02-02 19:51 session3/gemm.hpp -rw-rw-r-- lehn/num 17181 2016-02-02 18:17 session4/avx.hpp -rw-rw-r-- lehn/num 33544 2016-02-02 19:53 session4/fma.hpp -rw-rw-r-- lehn/num 9797 2016-02-02 19:50 session4/gemm.hpp -rw-rw-r-- lehn/num 17181 2016-02-02 18:17 session5/avx.hpp -rw-rw-r-- lehn/num 33544 2016-02-02 19:53 session5/fma.hpp -rw-rw-r-- lehn/num 1898 2016-02-02 17:11 session5/gccvec.hpp -rw-rw-r-- lehn/num 3353 2016-02-02 01:34 session5/gccvec2.hpp -rw-rw-r-- lehn/num 10291 2016-02-02 19:50 session5/gemm.hpp -rw-rw-r-- lehn/num 13537 2016-02-02 16:05 session7/avx.hpp -rw-rw-r-- lehn/num 24390 2016-02-02 16:05 session7/fma.hpp -rw-rw-r-- lehn/num 1537 2016-02-12 21:01 session7/gccvec.hpp -rw-rw-r-- lehn/num 3353 2016-02-02 16:05 session7/gccvec2.hpp -rw-rw-r-- lehn/num 16946 2016-02-12 21:27 session7/gemm.hpp -rw-rw-r-- lehn/num 6221 2016-02-13 09:20 session7/lu.hpp -rw-rw-r-- lehn/num 13537 2016-02-13 12:55 session8/avx.hpp -rw-rw-r-- lehn/num 24390 2016-02-13 12:55 session8/fma.hpp -rw-rw-r-- lehn/num 1537 2016-02-13 12:55 session8/gccvec.hpp -rw-rw-r-- lehn/num 3353 2016-02-13 12:55 session8/gccvec2.hpp -rw-rw-r-- lehn/num 16946 2016-02-13 12:55 session8/gemm.hpp -rw-rw-r-- lehn/num 6756 2016-02-14 02:51 session8/lu.hpp -rw-r--r-- lehn/num 5159 2016-02-02 18:36 session1/matprod.cc -rw-rw-r-- lehn/num 5158 2016-02-02 18:35 session2/matprod.cc -rw-rw-r-- lehn/num 5179 2016-02-02 18:16 session3/matprod.cc -rw-rw-r-- lehn/num 5401 2016-01-27 00:35 session4/matprod.cc -rw-rw-r-- lehn/num 6913 2016-01-27 00:53 session4/symatprod.cc -rw-rw-r-- lehn/num 5356 2016-01-31 11:28 session5/matprod.cc -rw-rw-r-- lehn/num 4301 2016-02-11 16:34 session7/bench_lu.cc -rw-rw-r-- lehn/num 5833 2016-02-14 10:37 session8/bench2_lu.cc -rw-rw-r-- lehn/num 5833 2016-02-14 20:07 session8/bench_lu.cc -rw-rw-r-- lehn/num 5001 2016-02-13 14:53 session8/bench_mkl_lu.cc -rw-rw-r-- lehn/num 381 2016-01-22 14:31 session1/plot.session1.mflops -rw-rw-r-- lehn/num 377 2016-01-22 14:32 session1/plot.session1.time -rw-rw-r-- lehn/num 397 2016-01-22 14:32 session1/plot.session1.time_log -rw-rw-r-- lehn/num 496 2016-01-23 00:34 session2/plot.session2.mflops -rw-rw-r-- lehn/num 492 2016-01-23 00:35 session2/plot.session2.time -rw-rw-r-- lehn/num 512 2016-01-23 00:35 session2/plot.session2.time_log -rw-rw-r-- lehn/num 608 2016-01-23 11:03 session3/plot.session3.mflops -rw-rw-r-- lehn/num 604 2016-01-23 11:03 session3/plot.session3.time -rw-rw-r-- lehn/num 624 2016-01-23 11:03 session3/plot.session3.time_log -rw-rw-r-- lehn/num 732 2016-01-27 00:42 session4/plot.session4.gemm.mflops -rw-rw-r-- lehn/num 728 2016-01-27 00:42 session4/plot.session4.gemm.time -rw-rw-r-- lehn/num 748 2016-01-27 00:42 session4/plot.session4.gemm.time_log -rw-rw-r-- lehn/num 777 2016-01-27 01:45 session4/plot.session4.symm.mflops -rw-rw-r-- lehn/num 773 2016-01-27 01:44 session4/plot.session4.symm.time -rw-rw-r-- lehn/num 793 2016-01-27 01:44 session4/plot.session4.symm.time_log -rw-rw-r-- lehn/num 949 2016-02-01 00:29 session5/plot.mt.session5.mflops -rw-rw-r-- lehn/num 945 2016-02-01 00:30 session5/plot.mt.session5.time -rw-rw-r-- lehn/num 965 2016-02-01 00:30 session5/plot.mt.session5.time_log -rw-rw-r-- lehn/num 608 2016-02-01 00:31 session5/plot.session5.mflops -rw-rw-r-- lehn/num 604 2016-02-01 00:31 session5/plot.session5.time -rw-rw-r-- lehn/num 624 2016-02-01 00:46 session5/plot.session5.time_log -rw-rw-r-- lehn/num 1007 2016-02-11 19:44 session7/plot.session7-mt.lu -rw-rw-r-- lehn/num 1029 2016-02-11 19:45 session7/plot.session7-mt.lu.log -rw-rw-r-- lehn/num 1021 2016-02-12 23:56 session7/plot.session7-mt.lu.mflops -rw-rw-r-- lehn/num 577 2016-02-11 19:41 session7/plot.session7.lu -rw-rw-r-- lehn/num 597 2016-02-11 19:42 session7/plot.session7.lu.log -rw-rw-r-- lehn/num 589 2016-02-11 19:43 session7/plot.session7.lu.mflops -rw-rw-r-- lehn/num 1007 2016-02-11 20:54 session8/plot.session7-mt.lu -rw-rw-r-- lehn/num 1029 2016-02-11 20:54 session8/plot.session7-mt.lu.log -rw-rw-r-- lehn/num 1021 2016-02-11 20:54 session8/plot.session7-mt.lu.mflops -rw-rw-r-- lehn/num 601 2016-02-14 10:39 session8/plot.session8.lu -rw-rw-r-- lehn/num 622 2016-02-14 10:39 session8/plot.session8.lu.log -rw-rw-r-- lehn/num 613 2016-02-14 10:40 session8/plot.session8.lu.mflops -rw-rw-r-- lehn/num 547 2016-02-14 09:21 session8/plot.session8.lu.mflops-log $shell>