==================================
Simple Cache Optimization for GEMM
==================================

Function `dgemm_simple_blk` is supposed to compute the matrix product
block-wise (compare notes from the lecture):

- Blocks of $A$ have a maximal dimension of `DGEMM_BLK_M` times `DGEMM_BLK_K`
- Blocks of $B$ have a maximal dimension of `DGEMM_BLK_K` times `DGEMM_BLK_N`
- Blocks of $C$ have a maximal dimension of `DGEMM_BLK_M` times `DGEMM_BLK_N`
- The product of blocks are computed with an unblocked gemm-implementation.
- In order to fit into the cache each block gets buffered in a (locally) alllocated
  array.


Exercise
========
Implement function `dgemm_simple_blk`:

:import: session04/blas3_gemm_simple_blk_ex.c

:navigate: up    -> doc:index
           back  -> doc:session04/page08
           next  -> doc:session04/page10