================= GEMM Macro Kernel ================= The GEMM Macro Kernel computes the GEMM operation $C \leftarrow \beta C + \alpha A B$ based on packing blocks of $A$ and $B and multiplying these using the GEMM micro kernel. The matrix dimension are as follows: - $A$ is a $m \times k$ matrix with $m < M_c$ and $k < K_c$. - $B$ is a $k \times n$ matrix with $n < N_c$. - $C$ is a $m \times n$ matrix. Exercise: Test Framework ======================== - Parameters $M_C$, $N_C$, $K_C$ as well as $M_r$, $N_r$ are defined through macros. Hereby we require that $M_r$ divides $M_c$ and $N_r$ divides $N_c$. Beside these restrictions you can choose arbitrary values. However, it is usually a good idea to use pairwise different values. (Why?) - In `main` the following test case should be setup: - Allocate a $m \times k$ matrix $A$ with $m < M_c$ and $k < K_c$. The strict smaller inequality is chosen on purpose! (Why?) - Allocate a $k \times n$ matrix $B$ with $n < N_c$. - Also make sure that $m$, $n$ and $k$ are pairwise different. - Allocate two $m \times n$ matrices $C_0$ and $C_1$. - Initialize all matrices. Hereby $C_0$ and $C_1$ should be equal. - Print matrices $A$, $B$ and $C_0$. - For some fixed value of $\alpha$ and $\beta$ compute $C_0 \leftarrow \beta C_0 + \alpha A B$ with a reference implementation for GEMM. - Print $C_0$ - Print $C_1$ Note: Print the name of the matrix before you print its value. Exercise: Add a call of the GEMM Macro Kernel ============================================= The macro kernel assumes that it receives already packed blocks $A$ and $B$. The signature for the macro kernel is given as ---- CODE(type=c) -------------------------------------------------------------- void dgemm_macro(size_t m, size_t n, size_t k, double alpha, const double *A, const double *B, double beta, double *C, ptrdiff_t incRowC, ptrdiff_t incColC); -------------------------------------------------------------------------------- - Why are there no row and column increments for $A$ and $B$? - Add in `main` a call of the macro kernel as follows: - Allocate buffers $A_p$ and $B_p$ of length $M_c \cdot K_C$ and $K_c \cdot N_c$ respectively. - Pack $A$ and $B$ into these buffers. - Call the macro kernel - Deallocate the buffers. Do not continue until this compiles and runs without crashing. Exercise: Implement the macro kernel ==================================== Implement the macro-kernel: - $A$ gets partitioned into horizontal panels of dimension $m_r \times k$ with $m_r \leq M_r$. - $B$ gets partitioned into vertical panels of dimension $k \times n_r$ with $n_r \leq N_r$. - Panels of $A$ are multiplied with panels of $B$ using the micro kernel. - Note that micro kernel requires that dimensions of $A$ and $B$ have dimension $M_r \times k$ and $k \times N_r$ respectively! In case of $m < M_r$ or $n < N_r$ recall that the buffers contain panels that were extended through zero padding to dimensions $M_r \times k$ and $k \times N_r$ respectively. So in this case use the following workaround: - Compute $AB \leftarrow \alpha A_i B_j$ were $A_i$ and $B_J$ denote panels. - Compute $C_{i,j} \leftarrow \beta C_{i,j}$ (GEAXPY) where $C_{i,j}$ denotes the corresponding block of $C$. - Compute $C_{i,j} \leftarrow AB|_{\text{dim}{C_{i,j}}}$ where $AB|_{\text{dim}{C_{i,j}}}$ denotes the upper-left part of $AB$ that is relevant for updating $C_{i,j}$. :navigate: up -> doc:index back -> doc:session04/page16 next -> doc:session04/page18