GEMM Micro Kernel
The GEMM Micro Kernel computes the GEMM operation \(C \leftarrow \beta C + \alpha A B\) where
-
\(A\) is a \(M_r \times k\) matrix with row increment 1 (i.e. col major)
-
\(B\) is a \(k \times N_r\) matrix with col increment 1 (i.e. row major)
-
\(C\) is a \(M_r \times N_r\) matrix which can be row or col major.
Exercise: Test Framework
-
Parameters \(M_r\), \(N_r\) are defined through macros. You can choose arbitrary values. However, it is usually a good idea to use pairwise different values. (Why?)
-
In main the following test case should be setup:
-
Allocate a \(M_r \times k\) matrix \(A\).
-
Allocate a \(k \times N_r\) matrix \(B\).
-
Also make sure that \(M_r\), \(N_r\) and \(k\) are pairwise different.
-
Allocate two \(M_r \times N_r\) matrices \(C_0\) and \(C_1\).
-
Initialize all matrices. Hereby \(C_0\) and \(C_1\) should be equal.
-
Print matrices \(A\), \(B\) and \(C_0\).
-
For some fixed value of \(\alpha\) and \(\beta\) compute \(C_0 \leftarrow \beta C_0 + \alpha A B\) with a reference implementation for GEMM.
-
Print \(C_0\)
-
Print \(C_1\)
-
Note: Print the name of the matrix before you print its value.
Exercise: Implement and call the micro kernel
The signature of the micor kernel is defined by
void dgemm_micro(size_t k, double alpha, const double *A, const double *B, double beta, double *C, ptrdiff_t incRowC, ptrdiff_t incColC);
-
Why are there no dimensions \(m\) and \(n\) for \(C\)?
-
Why are there no row and column increments for \(A\) and \(B\)?
-
What are the fastest variants to realize the GEMM operation in this case?
-
Implement the operation as follows:
-
Use a buffer \(AB\) on the stack with length \(M_r \cdot N_r\).
-
Zero initialize the buffer.
-
Compute \(AB \leftarrow A B\) (using on of the two optimal variants!)
-
Compute \(C \leftarrow \beta C\). Recall that the case \(\beta=0\) needs special treatment. (Why?)
-
Compute \(C \leftarrow C + \alpha AB\).
-