============================================== Using GCC Vector-Extensions for Micro-Kernels [TOC:2] ============================================== Tar-Ball for this Session ========================= The tar-ball __session5.tgz__ contains the files: ---- SHELL --------------------------------------------------------------------- tar tfvz session5.tgz -------------------------------------------------------------------------------- :links: session5.tgz -> http://www.mathematik.uni-ulm.de/~lehn/test_ublas/session5.tgz Compiling for FMA or AVX ======================== - If you have FMA (i.e. AVX2 with fused mult-add) compile with `g++ -Wall -DNDEBUG -mfma -Ofast -I ../../boost_1_60_0/ -std=c++11 -DHAVE_GCCVEC -DBS_D_NR=12 -DBS_D_NC=4092 matprod.cc` - If you have AVX but not FMA with `g++ -Wall -DNDEBUG -mavx -Ofast -I ../../boost_1_60_0/ -std=c++11 -DHAVE_GCCVEC matprod.cc` Compile and Run Benchmark ========================= The machine this page gets generated only has AVX. Also note that the GCC vector extensions need a recent GCC compiler. Here we will use GCC 5.3: ---- SHELL (path=session5,hostname=heim) --------------------------------------- g++-5.3 -Wall -DNDEBUG -mavx -Ofast -I ../../boost_1_60_0/ -std=c++11 -DHAVE_GCCVEC -DM_MAX=500 matprod.cc ./a.out -------------------------------------------------------------------------------- Micro-Kernel ============ :import: session5/gccvec.hpp Modified `gemm.hpp` =================== :import: session5/gccvec.hpp Benchmark Results for GEMM: Single-Threaded =========================================== Resuts ------ ---- SHELL (path=session5,hostname=heim) --------------------------------------- g++-5.3 -Wall -DNDEBUG -mavx -Ofast -I ../../boost_1_60_0/ -std=c++11 -DHAVE_GCCVEC matprod.cc ./a.out > report.session5 cat report.session5 gnuplot plot.session5.mflops gnuplot plot.session5.time gnuplot plot.session5.time_log -------------------------------------------------------------------------------- MFLOPS ------ ---- IMAGE ---------------------------- session5/bench.session5.mflops.svg --------------------------------------- Time ---- ---- IMAGE -------------------------- session5/bench.session5.time.svg ------------------------------------- Time with Logarithmic scale --------------------------- ---- IMAGE ------------------------------ session5/bench.session5.time_log.svg ----------------------------------------- Benchmark Results for GEMM: Multi-Threaded ========================================== Resuts ------ ---- SHELL (path=session5,hostname=heim) --------------------------------------- g++-5.3 -Wall -DNDEBUG -mavx -Ofast -I ../../boost_1_60_0/ -std=c++11 -DHAVE_GCCVEC -fopenmp matprod.cc ./a.out > report.mt.session5 cat report.mt.session5 gnuplot plot.mt.session5.mflops gnuplot plot.mt.session5.time gnuplot plot.mt.session5.time_log -------------------------------------------------------------------------------- MFLOPS ------ ---- IMAGE ---------------------------- session5/bench.mt.session5.mflops.svg --------------------------------------- Time ---- ---- IMAGE -------------------------- session5/bench.mt.session5.time.svg ------------------------------------- Time with Logarithmic scale --------------------------- ---- IMAGE ------------------------------ session5/bench.mt.session5.time_log.svg ----------------------------------------- :navigate: back -> doc:session4/page01 next -> doc:session6/page01