Benchmark
Content |
Benchmark
heim$ gcc -Wall -std=c11 -O3 -I. -o test_dgemm test_dgemm.c ulmaux.c ulmblas.c dgemm_micro_avx_4x8.s heim$ ./test_dgemm bench | tee gemm_s18.dat #colmajorA = 1 #colmajorB = 1 #colmajorC = 1 #alpha = 1.000000 #beta = 1.000000 # m n k time ref mflops ref time 1 mflops 1 err 100 100 100 0.00 1040.00 0.00 11636.36 1.2e-03 200 200 200 0.01 2240.00 0.00 16290.91 5.6e-04 300 300 300 0.02 2250.00 0.00 15709.09 4.0e-04 400 400 400 0.06 2258.82 0.01 16640.00 3.5e-04 500 500 500 0.11 2343.75 0.02 15909.09 2.9e-04 600 600 600 0.19 2273.68 0.03 15709.09 2.4e-04 700 700 700 0.30 2286.67 0.04 15830.77 2.1e-04 800 800 800 0.45 2292.54 0.06 16168.42 1.9e-04 900 900 900 0.65 2254.64 0.09 15621.43 1.7e-04 1000 1000 1000 0.90 2230.48 0.12 16216.22 1.5e-04 Passed 10 of 10 tests. heim$
Plot for Benchmark
The plot was produced with
heim$ gnuplot gemm.plot heim$
using the following script:
set terminal svg size 1400, 500 set output "bench.gemm.svg" set xlabel "Matrix dimension M=N=K" set ylabel "MFLOPS" set yrange [0:20000] set title "GEMM (col major)" set key outside set pointsize 0.5 plot "gemm_s16.dat" using 1:5 with linespoints lt 2 lw 3 title "Reference Implementation", \ "gemm_s15.dat" using 1:7 with linespoints lt 3 lw 3 title "Session 15: Simple Cache Optimization", \ "gemm_s16.dat" using 1:7 with linespoints lt 5 lw 3 title "Session 16: Cache Optimized", \ "gemm_s17.dat" using 1:7 with linespoints lt 6 lw 3 title "Session 17: Cache Optimized with GCC Vector Extensions", \ "gemm_s18.dat" using 1:7 with linespoints lt 7 lw 3 title "Session 18: Cache Optimized with Assembler Code"