Benchmarks
Content |
Goals:
-
Development of simple benchmarks that allow to compare the performance of different numerical algorithms.
-
Using macros to configure parameters at compile time from the command line. This allows to setup Makefiles that test and benchmark various configurations.
-
Comparing not just the runtimes of different numerical algorithms but comparing also the results.
-
Generate output tables that permit post processing.
-
Using tools like gnuplot to visualize the results.
Example from the lecture
The following program compares the wall time of matrix initializations (function initMatrix) with different storage orders, i.e. row-major vs. col-major. Two different arrays, i.e. buffer1 and buffer2, are used for the matrices. Note:
-
In this example the arrays are located on the data segment (buffer1 and buffer2 are defined as global arrays).
-
Figure out which storage orders are used when we call initMatrix.
The function asumDiff sums up the absolute values of the differences of two equally-dimensioned matrices. This allows to check whether both algorithms deliver numerically the same results.
#include <stdio.h> #include <stdlib.h> #include <sys/times.h> #include <unistd.h> #include <math.h> #ifndef MINDIM_M #define MINDIM_M 1000 #endif #ifndef MINDIM_N #define MINDIM_N 1000 #endif #ifndef MINDIM_K #define MINDIM_K 1000 #endif #ifndef MAXDIM_M #define MAXDIM_M 7000 #endif #ifndef MAXDIM_N #define MAXDIM_N 7000 #endif #ifndef MAXDIM_K #define MAXDIM_K 7000 #endif #ifndef INC_M #define INC_M 100 #endif #ifndef INC_N #define INC_N 100 #endif #ifndef INC_K #define INC_K 100 #endif #ifndef MIN_T #define MIN_T 1 #endif double buffer1[MAXDIM_M*MAXDIM_N]; double buffer2[MAXDIM_M*MAXDIM_N]; /* return real time in seconds since start of the process */ double wallTime() { static int ticks_per_second = 0; if (!ticks_per_second) { ticks_per_second = sysconf(_SC_CLK_TCK); } struct tms timebuf; /* times returns the number of real time ticks passed since start */ return (double) times(&timebuf) / ticks_per_second; } double asumDiff(size_t m, size_t n, const double *A, size_t incRowA, size_t incColA, const double *B, size_t incRowB, size_t incColB) { double diff = 0; for (size_t i = 0; i < m; ++i) { for (size_t j = 0; j < n; ++j) { diff += fabs(B[i*incRowB+j*incColB] - A[i*incRowA+j*incColA]); } } return diff; } void initMatrix(size_t m, size_t n, double *A, size_t incRowA, size_t incColA) { for (size_t j = 0; j < n; ++j) { for (size_t i = 0; i < m; ++i) { A[i*incRowA+j*incColA] = j*n+i+1; } } } void printMatrix(size_t m, size_t n, const double *A, size_t incRowA, size_t incColA) { for (size_t i = 0; i < m; ++i) { printf(" "); for (size_t j = 0; j < n; ++j) { printf("%4.1lf ", A[i*incRowA+j*incColA]); } printf("\n"); } printf("\n"); } int main() { printf(" M N t1 t2 t2/t1 diff\n"); printf(" col-maj row-maj\n"); printf("============================================\n"); for (size_t m = MINDIM_M, n = MINDIM_N; m < MAXDIM_M && n < MAXDIM_N; m += INC_M, n += INC_N) { size_t runs = 0; double t1 = 0; do { double t0 = wallTime(); initMatrix(m, n, buffer1, 1, m); t1 += wallTime() - t0; ++runs; } while (t1 < MIN_T); t1 /= runs; runs = 0; double t2 = 0; do { double t0 = wallTime(); initMatrix(m, n, buffer2, n, 1); t2 += wallTime() - t0; ++runs; } while (t2 < MIN_T); t2 /= runs; double diff = asumDiff(m, n, buffer1, 1, m, buffer2, n, 1); printf("%4zd %4zd %7.2lf %7.2lf %7.2lf %10.2le\n", m, n, t1, t2, t2/t1, diff); } }
Compilation and execution
In this run, the macro parameter MIN_T was set to 0 on the command line to speed up the generation of this page.
theon$ gcc -Wall -DMIN_T=0 -o bench_initmatrix bench_initmatrix.c theon$ ./bench_initmatrix M N t1 t2 t2/t1 diff col-maj row-maj ============================================ 1000 1000 0.02 0.02 1.00 0.00e+00 1100 1100 0.01 0.02 2.00 0.00e+00 1200 1200 0.01 0.02 2.00 0.00e+00 1300 1300 0.01 0.02 2.00 0.00e+00 1400 1400 0.01 0.02 2.00 0.00e+00 1500 1500 0.01 0.03 3.00 0.00e+00 1600 1600 0.01 0.05 5.00 0.00e+00 1700 1700 0.01 0.04 4.00 0.00e+00 1800 1800 0.02 0.03 1.50 0.00e+00 1900 1900 0.01 0.04 4.00 0.00e+00 2000 2000 0.01 0.07 7.00 0.00e+00 2100 2100 0.02 0.05 2.50 0.00e+00 2200 2200 0.02 0.06 3.00 0.00e+00 2300 2300 0.02 0.07 3.50 0.00e+00 2400 2400 0.02 0.12 6.00 0.00e+00 2500 2500 0.02 0.10 5.00 0.00e+00 2600 2600 0.02 0.11 5.50 0.00e+00 2700 2700 0.02 0.12 6.00 0.00e+00 2800 2800 0.03 0.17 5.67 0.00e+00 2900 2900 0.03 0.15 5.00 0.00e+00 3000 3000 0.03 0.16 5.33 0.00e+00 3100 3100 0.04 0.17 4.25 0.00e+00 3200 3200 0.04 0.21 5.25 0.00e+00 3300 3300 0.04 0.22 5.50 0.00e+00 3400 3400 0.04 0.23 5.75 0.00e+00 3500 3500 0.04 0.32 8.00 0.00e+00 3600 3600 0.05 0.31 6.20 0.00e+00 3700 3700 0.05 0.30 6.00 0.00e+00 3800 3800 0.05 0.30 6.00 0.00e+00 3900 3900 0.06 0.31 5.17 0.00e+00 4000 4000 0.05 0.34 6.80 0.00e+00 4100 4100 0.05 0.37 7.40 0.00e+00 4200 4200 0.06 0.37 6.17 0.00e+00 4300 4300 0.06 0.39 6.50 0.00e+00 4400 4400 0.07 0.42 6.00 0.00e+00 4500 4500 0.06 0.43 7.17 0.00e+00 4600 4600 0.08 0.44 5.50 0.00e+00 4700 4700 0.07 0.47 6.71 0.00e+00 4800 4800 0.08 0.48 6.00 0.00e+00 4900 4900 0.08 0.51 6.37 0.00e+00 5000 5000 0.08 0.53 6.63 0.00e+00 5100 5100 0.09 0.56 6.22 0.00e+00 5200 5200 0.09 0.58 6.44 0.00e+00 5300 5300 0.09 0.60 6.67 0.00e+00 5400 5400 0.09 0.62 6.89 0.00e+00 5500 5500 0.10 0.64 6.40 0.00e+00 5600 5600 0.11 0.66 6.00 0.00e+00 5700 5700 0.11 0.75 6.82 0.00e+00 5800 5800 0.11 0.72 6.55 0.00e+00 5900 5900 0.11 0.75 6.82 0.00e+00 6000 6000 0.12 0.77 6.42 0.00e+00 6100 6100 0.13 0.79 6.08 0.00e+00 6200 6200 0.13 0.81 6.23 0.00e+00 6300 6300 0.13 0.85 6.54 0.00e+00 6400 6400 0.14 0.88 6.29 0.00e+00 6500 6500 0.14 0.90 6.43 0.00e+00 6600 6600 0.15 1.04 6.93 0.00e+00 6700 6700 0.15 0.90 6.00 0.00e+00 6800 6800 0.15 0.94 6.27 0.00e+00 6900 6900 0.17 0.96 5.65 0.00e+00 theon$
Running a benchmark
We show by example how to generate a plot for a benchmark. Your task will be afterwards to change these plots by showing different characteristics (speedup) and by displaying data differently (e.g. by applying a logarithmic scale).
Redirecting output into a file
As printf writes it output to standard output (i.e. stdout), we are free to redirect the output to a file at the command line of the shell:
theon$ ./bench_initmatrix >initmatrix.data theon$
Writing a gnuplot script
gnuplot can be used interactively. But it appears more practical to write scripts as this documents how diagrams were generated and as scripts can be easily corrected, adapted, and duplicated. Following gnuplot script shows how the runtime depends on the storage organization for a particular algorithm.
set terminal svg size 900, 500 set output "initmatrix.svg" set xlabel "Matrix dim A: M=N" set title "matrix initialization" set key outside set pointsize 0.5 plot "initmatrix.data" using 1:3 with linespoints lt 2 lw 3 title "col-major", \ "initmatrix.data" using 1:4 with linespoints lt 3 lw 3 title "row-major"
Generating a plot
heim$ gnuplot initmatrix.gnuplot heim$
Exercises
You might want to use google for the following tasks:
-
Label the y axis.
-
Change the font size for the title and the axes
-
Try a logarithmic scale for x and y axis. When and why does it make sense to use a logarithmic scale?
-
Create a new plot for the speedup. Which scale is to be prefered?