Benchmark

Content

Benchmark

heim$ gcc -Wall -std=c11 -O3 -I. -o test_dgemm test_dgemm.c ulmaux.c ulmblas.c dgemm_micro_avx_4x8.s
heim$ ./test_dgemm bench | tee gemm_s18.dat
#colmajorA = 1
#colmajorB = 1
#colmajorC = 1
#alpha     = 1.000000
#beta      = 1.000000
#   m    n    k  time ref mflops ref     time 1   mflops 1     err 
  100  100  100       0.00    1040.00       0.00   11636.36 1.2e-03 
  200  200  200       0.01    2240.00       0.00   16290.91 5.6e-04 
  300  300  300       0.02    2250.00       0.00   15709.09 4.0e-04 
  400  400  400       0.06    2258.82       0.01   16640.00 3.5e-04 
  500  500  500       0.11    2343.75       0.02   15909.09 2.9e-04 
  600  600  600       0.19    2273.68       0.03   15709.09 2.4e-04 
  700  700  700       0.30    2286.67       0.04   15830.77 2.1e-04 
  800  800  800       0.45    2292.54       0.06   16168.42 1.9e-04 
  900  900  900       0.65    2254.64       0.09   15621.43 1.7e-04 
 1000 1000 1000       0.90    2230.48       0.12   16216.22 1.5e-04 
Passed 10 of 10 tests.
heim$ 

Plot for Benchmark

The plot was produced with

heim$ gnuplot gemm.plot
heim$ 

using the following script:

set terminal svg size 1400, 500
set output "bench.gemm.svg"
set xlabel "Matrix dimension M=N=K"
set ylabel "MFLOPS"
set yrange [0:20000]
set title "GEMM (col major)"
set key outside
set pointsize 0.5
plot "gemm_s16.dat" using 1:5  with linespoints lt 2 lw 3 title "Reference Implementation", \
     "gemm_s15.dat" using 1:7  with linespoints lt 3 lw 3 title "Session 15: Simple Cache Optimization", \
     "gemm_s16.dat" using 1:7  with linespoints lt 5 lw 3 title "Session 16: Cache Optimized", \
     "gemm_s17.dat" using 1:7  with linespoints lt 6 lw 3 title "Session 17: Cache Optimized with GCC Vector Extensions", \
     "gemm_s18.dat" using 1:7  with linespoints lt 7 lw 3 title "Session 18: Cache Optimized with Assembler Code"