==================== uBLAS-fying the GEMM [TOC:2] ==================== With uBLAS it is pretty straight forward to generalize the GEMM Operation: - SYMM, HEMM comes for free. Unlike the traditional BLAS there are no requirements about underlying storage scheme. Having a packed syymetric matrix is just fine. - Support for operations like $C = (\alpha_1 A_1 + \alpha_2 A_2) \cdot (\beta B_1 + \beta B_2)$ comes for free. No temporaries get created and you don't loose performance. However, due to my lack of deep knowledge of uBLAS the implementation is not as ellegant as could be. So things will become even better. Tar-Ball for this Session ========================= The tar-ball __session4.tgz__ contains the files: ---- SHELL --------------------------------------------------------------------- tar tfvz session4.tgz -------------------------------------------------------------------------------- :links: session4.tgz -> http://www.mathematik.uni-ulm.de/~lehn/test_ublas/session4.tgz Compile and Run Benchmark ========================= For a quick benchmark we again limit $m$ to $500$ with `-DM_MAX=500`. This time we also enable the optimized micro kernel for AVX with `-DHAVE_AVX`. There are two demos. Demo for GEMM ------------- ---- SHELL (path=session3,hostname=heim) --------------------------------------- g++ -Ofast -mavx -Wall -std=c++11 -DNDEBUG -DHAVE_AVX -fopenmp -DM_MAX=500 -I ../boost_1_60_0 matprod.cc ./a.out -------------------------------------------------------------------------------- Demo for SYMM ------------- ---- SHELL (path=session3,hostname=heim) --------------------------------------- g++ -Ofast -mavx -Wall -std=c++11 -DNDEBUG -DHAVE_AVX -fopenmp -DM_MAX=500 -I ../boost_1_60_0 symatprod.cc ./a.out -------------------------------------------------------------------------------- Core Function for Matrix-Matrix Produkt ======================================= :import: session4/gemm.hpp Benchmark Results for GEMM ========================== The benchmark tests $C = A B$ for general matrices $A$, $B$ and $C$. Please also test for cases where $A$ or $B$ are expressions. Benchmark Program ----------------- :import: session4/matprod.cc Resuts ------ ---- SHELL (path=session4,hostname=heim) --------------------------------------- g++ -Ofast -mavx -Wall -std=c++11 -DHAVE_AVX -fopenmp -DNDEBUG -I ../boost_1_60_0 matprod.cc ./a.out > report.gemm.session4 cat report.gemm.session4 gnuplot plot.session4.gemm.mflops gnuplot plot.session4.gemm.time gnuplot plot.session4.gemm.time_log -------------------------------------------------------------------------------- MFLOPS ------ ---- IMAGE ---------------------------- session4/bench.session4.gemm.mflops.svg --------------------------------------- Time ---- ---- IMAGE -------------------------- session4/bench.session4.gemm.time.svg ------------------------------------- Time with Logarithmic scale --------------------------- ---- IMAGE ------------------------------ session4/bench.session4.gemm.time_log.svg ----------------------------------------- Benchmark SYMM ============== The benchmark tests $C = A B$ for a symmetric matrix $A$ in packed storage format. Please also test for cases where $A$ or $B$ are expressions. Benchmark Program ----------------- :import: session4/symatprod.cc Resuts ------ ---- SHELL (path=session4,hostname=heim) --------------------------------------- g++ -Ofast -mavx -Wall -std=c++11 -DHAVE_AVX -fopenmp -DNDEBUG -DM_MAX=1500 -I ../boost_1_60_0 symatprod.cc ./a.out > report.symm.session4 cat report.symm.session4 gnuplot plot.session4.symm.mflops gnuplot plot.session4.symm.time gnuplot plot.session4.symm.time_log -------------------------------------------------------------------------------- MFLOPS ------ ---- IMAGE ---------------------------- session4/bench.session4.symm.mflops.svg --------------------------------------- Time ---- ---- IMAGE -------------------------- session4/bench.session4.symm.time.svg ------------------------------------- Time with Logarithmic scale --------------------------- ---- IMAGE ------------------------------ session4/bench.session4.symm.time_log.svg ----------------------------------------- :navigate: back -> doc:session3/page01 next -> doc:session5/page01