=========================================== GEMM: Implementation of the Frame Algorithm [TOC] =========================================== Finally we will put things together in the GEMM frame algorithm that computes ---- LATEX --------------------------------------------------------------------- C \leftarrow \beta \cdot C + \alpha \cdot A \cdot B -------------------------------------------------------------------------------- where $C$ is a $m \times n$ matrix, $A$ is a $m \times k$ matrix and $B$ a $k \times n$ matrix. The following special cases are treated: - If $\beta = 0$ matrix $C$ can contain Nan (not a number) entries. - If $\alpha = 0$ or $k = 0$ matrix $C$ just gets scaled by $\beta$ (or gets initialized with zero values if $\beta=0$). Algorithm ========= The frame algorithm uses the GEMM macro kernel and the packing functions as building blocks. With respect to the constants $M_c$, $N_c$ and $K_c$ matrices $A$, $B$ and $C$ are considered as block matrices: ---- LATEX --------------------------------------------------------------------- A = \left(\begin{array}{c|c|c} & \vdots & \\ \hline \dots & A_{i \cdot M_c,\, \ell \cdot K_c} & \dots \\ \hline & \vdots & \\ \end{array}\right) ,\quad B = \left(\begin{array}{c|c|c} & \vdots & \\ \hline \dots & B_{\ell \cdot K_c,\, j \cdot N_c} & \dots \\ \hline & \vdots & \\ \end{array}\right) ,\quad C = \left(\begin{array}{c|c|c} & \vdots & \\ \hline \dots & C_{i \cdot M_c,\, j \cdot N_c} & \dots \\ \hline & \vdots & \\ \end{array}\right) -------------------------------------------------------------------------------- The complete algorithm then reads as: ---- BOX ----------------------------------------------------------------------- - if $\alpha = 0$ or $k = 0$ - $C \leftarrow \beta C$ - return - Allocate buffers $\overline{A}$ and $\overline{B}$ for packing blocks of $A$ and $B$ respectively. - For $j_b = 0, \dots, \left\lceil \frac{n}{N_c} \right\rceil - 1$ - For $\ell_b = 0, \dots, \left\lceil \frac{k}{K_c} \right\rceil - 1$ - Pack: $\overline{B} \leftarrow B_{\ell_b \cdot K_c,\, j_b \cdot N_c}$ - $\overline{\beta} \leftarrow \begin{cases} 1, & \ell>0 \\ \beta, & \text{else.} \end{cases}$ - For $i_b = 0, \dots, \left\lceil \frac{m}{M_c} \right\rceil - 1$ - Pack: $\overline{A} \leftarrow A_{i_b \cdot M_c,\, \ell_b \cdot K_c}$ - Macro Kernel: $C_{i_b \cdot M_c,\, j_b \cdot N_c} \leftarrow \overline{\beta} \cdot C_{i_b \cdot M_c,\, j_b \cdot N_c} + \alpha \cdot \overline{A} \cdot \overline{B} $ - Release buffers --------------------------------------------------------------------------------