=================
GEMM Micro Kernel
=================

The GEMM Micro Kernel computes the GEMM operation $C \leftarrow \beta C + \alpha A B$ where

- $A$ is a $M_r \times k$ matrix with row increment 1 (i.e. col major)
- $B$ is a $k \times N_r$ matrix with col increment 1 (i.e. row major)
- $C$ is a $M_r \times N_r$ matrix which can be row or col major.

Exercise: Test Framework
========================

- Parameters $M_r$, $N_r$ are defined through macros. You can choose arbitrary
  values.  However, it is usually a good idea to use pairwise different values. (Why?)
- In `main` the following test case should be setup:
    - Allocate a $M_r \times k$ matrix $A$.
    - Allocate a $k \times N_r$ matrix $B$.
    - Also make sure that $M_r$, $N_r$ and $k$ are pairwise different.
    - Allocate two $M_r \times N_r$ matrices $C_0$ and $C_1$.
    - Initialize all matrices.  Hereby $C_0$ and $C_1$ should be equal.
    - Print matrices $A$, $B$ and $C_0$.
    - For some fixed value of $\alpha$ and $\beta$ compute $C_0 \leftarrow \beta C_0 + \alpha A B$
      with a reference implementation for GEMM.
    - Print $C_0$
    - Print $C_1$

Note: Print the name of the matrix before you print its value.


Exercise: Implement and call the micro kernel
=============================================

The signature of the micor kernel is defined by

---- CODE(type=c) --------------------------------------------------------------
void
dgemm_micro(size_t k, double alpha,
            const double *A, const double *B,
            double beta,
            double *C, ptrdiff_t incRowC, ptrdiff_t incColC);
--------------------------------------------------------------------------------

- Why are there no dimensions $m$ and $n$ for $C$?
- Why are there no row and column increments for $A$ and $B$?
- What are the fastest variants to realize the GEMM operation in this case?
- Implement the operation as follows:
   - Use a buffer $AB$ on the stack with length  $M_r \cdot N_r$.
   - Zero initialize the buffer.
   - Compute $AB \leftarrow A B$ (using on of the two optimal variants!)
   - Compute $C \leftarrow \beta C$. Recall that the case $\beta=0$ needs special
     treatment. (Why?)
   - Compute $C \leftarrow C + \alpha AB$.


:navigate: up    -> doc:index
           back  -> doc:session04/page14
           next  -> doc:session04/page16