GEMM Micro Kernel

The GEMM Micro Kernel computes the GEMM operation \(C \leftarrow \beta C + \alpha A B\) where

\(A\) is a \(M_r \times k\) matrix with row increment 1 (i.e. col major)
\(B\) is a \(k \times N_r\) matrix with col increment 1 (i.e. row major)
\(C\) is a \(M_r \times N_r\) matrix which can be row or col major.

Exercise: Test Framework

Parameters \(M_r\), \(N_r\) are defined through macros. You can choose arbitrary values. However, it is usually a good idea to use pairwise different values. (Why?)
In main the following test case should be setup:
- Allocate a \(M_r \times k\) matrix \(A\).
- Allocate a \(k \times N_r\) matrix \(B\).
- Also make sure that \(M_r\), \(N_r\) and \(k\) are pairwise different.
- Allocate two \(M_r \times N_r\) matrices \(C_0\) and \(C_1\).
- Initialize all matrices. Hereby \(C_0\) and \(C_1\) should be equal.
- Print matrices \(A\), \(B\) and \(C_0\).
- For some fixed value of \(\alpha\) and \(\beta\) compute \(C_0 \leftarrow \beta C_0 + \alpha A B\) with a reference implementation for GEMM.
- Print \(C_0\)
- Print \(C_1\)

Note: Print the name of the matrix before you print its value.

Exercise: Implement and call the micro kernel

The signature of the micor kernel is defined by

void
dgemm_micro(size_t k, double alpha,
            const double *A, const double *B,
            double beta,
            double *C, ptrdiff_t incRowC, ptrdiff_t incColC);

Why are there no dimensions \(m\) and \(n\) for \(C\)?
Why are there no row and column increments for \(A\) and \(B\)?
What are the fastest variants to realize the GEMM operation in this case?
Implement the operation as follows:
- Use a buffer \(AB\) on the stack with length \(M_r \cdot N_r\).
- Zero initialize the buffer.
- Compute \(AB \leftarrow A B\) (using on of the two optimal variants!)
- Compute \(C \leftarrow \beta C\). Recall that the case \(\beta=0\) needs special treatment. (Why?)
- Compute \(C \leftarrow C + \alpha AB\).