# Testing the Matrix-Matrix Product

#### Content

Before we implement a GEMM operation with some simple cache optimization we need to setup a testing framework. We first state a simple criterion for the numerical comparison of two GEMM results. See Testing GEMM for more background on this topic.

Recall that the GEMM operation

$C \leftarrow \beta C + \alpha A B$

overwrites $$C$$ with the result of the operation. In the following we will denote

• with $$C_0$$ the initial value of $$C$$ and

• with $$\widehat{C}$$ a trusted result of the GEMM operation.

For a matrix $$C$$ produced by another GEMM operation we then require that

$\frac{\|C - \widehat{C}\,\|_\infty}{ \text{eps}\cdot\left( \max\{m,n,k\} \cdot |\alpha| \cdot \|A\|_\infty \|B\|_\infty + |\beta|\cdot\|C_0\|_\infty \right)}\leq \tau = \mathcal{O}(1)$

holds. In practice a test suite will have a certain default value (e.g. 1) for the threshold $$\tau$$ that can be user-defined through compiler flags.

## Exercise: Implement some tools for the error estimator

We will need an implementation for each of the operations below. If possible, in each operation the elements should be accessed cache friendly (so we can use the as efficient building blocks for other operations):

• dgeaxpy: Computing the update $$B \leftarrow \alpha A + B$$ for $$m \times n$$ matrices $$A$$ and $$B$$.

Note: If $$B$$ is stored in column major then elements of both matrices should be accessed column-wise. Otherwise elements should be accessed only row-wise.

• dgescal: Compute the scaling $$A \leftarrow \alpha A$$ for a $$m \times n$$ matrix.

Make sure that special cases for $$\alpha$$ are treated efficient and correct.

• dgecopy: Copy matrices, i.e. $$B \leftarrow A$$ for two $$m \times n$$ matrices.

Note: If $$B$$ is stored in column major then elements of both matrices should be accessed column-wise. Otherwise elements should be accessed only row-wise.

• dgenorm_inf: Computing the infinity-norm of an $$m \times n$$ matrix.

## Testing the tools

Use the following skeleton for your implementation and for testing. But make sure that you have a good understanding of this code:

• What cases for the storage order of $$A$$ and $$B$$ are tested?

• How could all possible cases for the storage order be tested?

Also note that function initMatrix was modified:

• Elements are accessed cache friendly (you can use this pattern!).

• If the additional argument withNan is true the matrix gets initialized with Nan (not a number) entries.

• Otherwise the matrix get initalized with random values.

#include <stdlib.h>
#include <stdio.h>
#include <stddef.h>
#include <math.h>
#include <stdbool.h>

//------------------------------------------------------------------------------

#define MY_ABS(x)   ((x)<0 ? -(x) : (x))

//------------------------------------------------------------------------------

void
initMatrix(size_t m, size_t n, double *A, ptrdiff_t incRowA, ptrdiff_t incColA,
bool withNan)
{
// if A is row major initialize A^T
if (MY_ABS(incRowA) > MY_ABS(incColA)) {
initMatrix(n, m, A, incColA, incRowA, withNan);
return;
}
// if A is col major
if (withNan) {
for (size_t j=0; j<n; ++j) {
for (size_t i=0; i<m; ++i) {
A[i*incRowA+j*incColA] = nan("");
}
}
} else {
for (size_t j=0; j<n; ++j) {
for (size_t i=0; i<m; ++i) {
double rValue = ((double)rand() - RAND_MAX/2)*2/RAND_MAX;
A[i*incRowA+j*incColA] = rValue;
}
}
}
}

void
printMatrix(size_t m, size_t n,
const double *A, ptrdiff_t incRowA, ptrdiff_t incColA)
{
for (size_t i=0; i<m; ++i) {
for (size_t j=0; j<n; ++j) {
printf("%10.3lf ", A[i*incRowA+j*incColA]);
}
printf("\n");
}
printf("\n");
}

//------------------------------------------------------------------------------

void
dgeaxpy(size_t m, size_t n, double alpha,
const double *A, ptrdiff_t incRowA, ptrdiff_t incColA,
double *B, ptrdiff_t incRowB, ptrdiff_t incColB)
{
}

void
dgecopy(size_t m, size_t n,
const double *A, ptrdiff_t incRowA, ptrdiff_t incColA,
double *B, ptrdiff_t incRowB, ptrdiff_t incColB)
{
}

void
dgescal(size_t m, size_t n, double alpha,
double *A, ptrdiff_t incRowA, ptrdiff_t incColA)
{
}

double
dgenorm_inf(size_t m, size_t n,
const double *A, ptrdiff_t incRowA, ptrdiff_t incColA)
{
return 0.0;
}

//------------------------------------------------------------------------------

#ifndef DIM_M
#define DIM_M 4
#endif

#ifndef DIM_N
#define DIM_N 5
#endif

double A[DIM_M*DIM_N];
double B[DIM_M*DIM_N];

const bool rowMajorA[] = {false, true};
const bool rowMajorB[] = {false, false};
const size_t numTests = sizeof(rowMajorA)/sizeof(bool);

int
main()
{
for (size_t test=0; test<numTests; ++test) {
ptrdiff_t incRowA = rowMajorA[test] ? DIM_N : 1;
ptrdiff_t incColA = rowMajorA[test] ? 1 : DIM_M;

ptrdiff_t incRowB = rowMajorB[test] ? DIM_N : 1;
ptrdiff_t incColB = rowMajorB[test] ? 1 : DIM_M;

printf("Matrix A is stored %s\n", rowMajorA[test] ? "row major"
: "col major");
printf("Matrix B is stored %s\n", rowMajorB[test] ? "row major"
: "col major");
srand(0);
initMatrix(DIM_M, DIM_N, A, incRowA, incColA, false);
initMatrix(DIM_M, DIM_N, B, incRowB, incColB, false);

printf("A =\n");
printMatrix(DIM_M, DIM_N, A, incRowA, incColA);
printf("B =\n");
printMatrix(DIM_M, DIM_N, B, incRowB, incColB);

printf("B = 2*B\n");
// TODO Call dgescal for: B <- 2*B
printf("B =\n");
printMatrix(DIM_M, DIM_N, B, incRowB, incColB);

printf("||A||_inf = %lf\n", 0.0 /* TODO/FIXME Call dgenorm_inf */);
printf("||B||_inf = %lf\n", 0.0 /* TODO/FIXME Call dgenorm_inf */);

// TODO: Call dgeaxpy for: B <- B - A
printf("||B-A||_inf = %lf\n",
dgenorm_inf(DIM_M, DIM_N, B, incRowB, incColB));

// TODO: Call dgecopy for: B <- A
printf("B = A\n");
printf("B =\n");
printMatrix(DIM_M, DIM_N, B, incRowB, incColB);

// TODO: Call dgeaxpy for: B <- B - A
printf("||B-A||_inf = %lf\n",
dgenorm_inf(DIM_M, DIM_N, B, incRowB, incColB));
}
}