============================= Using Optimized Micro-Kernels [TOC:2] ============================= Changes: - The benchmark program `matprod.cc` was not modified. - In `gemm.hpp` memory for buffers now gets aligned. - If you compile with `-DHAVE_AVX` the file `avx.hpp` gets included. This contains inline-assembly code for AVX. - If you compile with `-DHAVE_FMA` the file `fma.hpp` gets included. This contains inline-assembly code for FMA. - Default block sizes can be overwritten with: - `-DBS_D_MC=512` - `-DBS_D_KC=512` - ... Tar-Ball for this Session ========================= The tar-ball __session2.tgz__ contains the files: ---- SHELL --------------------------------------------------------------------- tar tfvz session2.tgz -------------------------------------------------------------------------------- :links: session2.tgz -> http://www.mathematik.uni-ulm.de/~lehn/test_ublas/session2.tgz Compile and Run Benchmark ========================= For a quick benchmark we again limit $m$ to $500$ with `-DM_MAX=500`. This time we also enable the optimized micro kernel for AVX with `-DHAVE_AVX`: ---- SHELL (path=session2,hostname=heim) --------------------------------------- g++ -Ofast -mavx -Wall -std=c++11 -DNDEBUG -DHAVE_AVX -DM_MAX=500 -I ../boost_1_60_0 matprod.cc ./a.out -------------------------------------------------------------------------------- Main Program with Benchmark and Test ==================================== :import: session2/matprod.cc Core Function for Matrix-Matrix Produkt ======================================= :import: session2/gemm.hpp Micro-Kernel for AVX ==================== :import: session2/avx.hpp Micro-Kernel for FMA ==================== :import: session2/fma.hpp Benchmark Results ================= ---- SHELL (path=session2,hostname=heim) --------------------------------------- g++ -Ofast -mavx -Wall -std=c++11 -DHAVE_AVX -DNDEBUG -I ../boost_1_60_0 matprod.cc ./a.out > report.session2 cat report.session2 gnuplot plot.session2.mflops gnuplot plot.session2.time gnuplot plot.session2.time_log -------------------------------------------------------------------------------- MFLOPS ------ ---- IMAGE ----------------------- session2/bench.session2.mflops.svg ---------------------------------- Time ---- ---- IMAGE ----------------------- session2/bench.session2.time.svg ---------------------------------- Time with Logarithmic scale --------------------------- ---- IMAGE ------------------------- session2/bench.session2.time_log.svg ------------------------------------ :navigate: back -> doc:session1/page01 next -> doc:session3/page01