================= GEMM Micro Kernel ================= The GEMM Micro Kernel computes the GEMM operation $C \leftarrow \beta C + \alpha A B$ where - $A$ is a $M_r \times k$ matrix with row increment 1 (i.e. col major) - $B$ is a $k \times N_r$ matrix with col increment 1 (i.e. row major) - $C$ is a $M_r \times N_r$ matrix which can be row or col major. Exercise: Test Framework ======================== - Parameters $M_r$, $N_r$ are defined through macros. You can choose arbitrary values. However, it is usually a good idea to use pairwise different values. (Why?) - In `main` the following test case should be setup: - Allocate a $M_r \times k$ matrix $A$. - Allocate a $k \times N_r$ matrix $B$. - Also make sure that $M_r$, $N_r$ and $k$ are pairwise different. - Allocate two $M_r \times N_r$ matrices $C_0$ and $C_1$. - Initialize all matrices. Hereby $C_0$ and $C_1$ should be equal. - Print matrices $A$, $B$ and $C_0$. - For some fixed value of $\alpha$ and $\beta$ compute $C_0 \leftarrow \beta C_0 + \alpha A B$ with a reference implementation for GEMM. - Print $C_0$ - Print $C_1$ Note: Print the name of the matrix before you print its value. Exercise: Implement and call the micro kernel ============================================= The signature of the micor kernel is defined by ---- CODE(type=c) -------------------------------------------------------------- void dgemm_micro(size_t k, double alpha, const double *A, const double *B, double beta, double *C, ptrdiff_t incRowC, ptrdiff_t incColC); -------------------------------------------------------------------------------- - Why are there no dimensions $m$ and $n$ for $C$? - Why are there no row and column increments for $A$ and $B$? - What are the fastest variants to realize the GEMM operation in this case? - Implement the operation as follows: - Use a buffer $AB$ on the stack with length $M_r \cdot N_r$. - Zero initialize the buffer. - Compute $AB \leftarrow A B$ (using on of the two optimal variants!) - Compute $C \leftarrow \beta C$. Recall that the case $\beta=0$ needs special treatment. (Why?) - Compute $C \leftarrow C + \alpha AB$. :navigate: up -> doc:index back -> doc:session04/page14 next -> doc:session04/page16