=================== Packing Blocks of B =================== In the cache optimized GEMM-Operation $\beta C + \alpha A B \to C$ the matrix $A$ gets partitioned into blocks of maximal dimension $M_c \times K_c$. Each block $B_{l,j}$ of $B$ gets packed into row-major vertical panels with $N_r$ columns. Exercise ======== Assume that $p$ is a buffer for $K_c \cdot N_c$ elements and $X$ a matrix block with dimension $k \times n$ where $k \leq K_c$ and $n \leq N_c$. *Be honest for your own sake and derive the following algorithm (or any equivalent algorithm) for packing blocks of $B$:* ---- BOX ----------------------------------------------------------------------- `dgepack_B(X, p)` ~~~~~~~~~~~~~~~~~ +------------------------------------------------------------------------------+ |- Input: | | - matrix $X = \left(x_{l,j}\right)$ with dimension $k \times n$ | | (assume $k \leq K_c$, $n \leq N_c$) | | - array $p$ with length $K_c \cdot N_c$ | |- On Return: | | - $p$ contains $X$ packed in vertical row-major panels with $N_r$ columns | +------------------------------------------------------------------------------+ | - for $j_1$ with $0 \leq j_1 < \left\lceil \frac{n}{N_r} \right\rceil$ | | - for $j_0$ with $0 \leq j_0 < N_r$ | | - for $l$ with $0 \leq l < k$ | | - $j \leftarrow j_1 \cdot N_r + j_0$ | | - $\nu \leftarrow j_1 \cdot N_r \cdot k + l \cdot N_r + j_0$ | | - if $j < n$ | | - $p_\nu \leftarrow x_{l,j}$ | | - else | | - $p_\nu \leftarrow 0$ | +------------------------------------------------------------------------------+ -------------------------------------------------------------------------------- Exercise ======== - Implement and *test* the following algorithm for packing blocks of $A$ - Start with an empty source file! :navigate: up -> doc:index back -> doc:session04/page12 next -> doc:session04/page14