=================== Packing Blocks of A =================== In the cache optimized GEMM-Operation $\beta C + \alpha A B \to C$ the matrix $A$ gets partitioned into blocks of maximal dimension $M_c \times K_c$. Each block $A_{i,l}$ of $A$ gets packed into col-major horizontal panels with $M_r$ rows. Assume that $p$ is a buffer for $M_c \cdot K_c$ elements and $X$ a matrix block with dimension $m \times k$ where $m \leq M_c$ and $k \leq K_c$. Then the following algorithm (which is using zero-based indices) can be used for packing: ---- BOX ----------------------------------------------------------------------- `dgepack_A(X, p)` ~~~~~~~~~~~~~~~~~ +------------------------------------------------------------------------------+ |- Input: | | - matrix $X = \left(x_{i,l}\right)$ with dimension $m \times k$ | | (assume $m \leq M_c$, $k \leq K_c$) | | - array $p$ with length $M_c \cdot K_c$ | |- On Return: | | - $p$ contains $X$ packed in horizontal col-major panels with $M_r$ rows | +------------------------------------------------------------------------------+ | - for $l$ with $0 \leq l < k$ | | - for $i_1$ with $0 \leq i_1 < \left\lceil \frac{m}{M_r} \right\rceil$ | | - for $i_0$ with $0 \leq i_0 < M_r$ | | - $i \leftarrow i_1 \cdot M_r + i_0$ | | - $\nu \leftarrow i_1 \cdot M_r \cdot k + l \cdot M_r + i_0$ | | - if $i < m$ | | - $p_\nu \leftarrow x_{i,l}$ | | - else | | - $p_\nu \leftarrow 0$ | +------------------------------------------------------------------------------+ -------------------------------------------------------------------------------- Exercise ======== - Implement and *test* the following algorithm for packing blocks of $A$ - Start with an empty source file! :navigate: up -> doc:index back -> doc:session04/page10 next -> doc:session04/page12