GEMM: Implementation of the Frame Algorithm

Content

Finally we will put things together in the GEMM frame algorithm that computes

\[C \leftarrow \beta \cdot C + \alpha \cdot A \cdot B\]

where \(C\) is a \(m \times n\) matrix, \(A\) is a \(m \times k\) matrix and \(B\) a \(k \times n\) matrix. The following special cases are treated:

Algorithm

The frame algorithm uses the GEMM macro kernel and the packing functions as building blocks. With respect to the constants \(M_c\), \(N_c\) and \(K_c\) matrices \(A\), \(B\) and \(C\) are considered as block matrices:

\[A = \left(\begin{array}{c|c|c} & \vdots & \\ \hline \dots & A_{i \cdot M_c,\, \ell \cdot K_c} & \dots \\ \hline & \vdots & \\ \end{array}\right),\quad B = \left(\begin{array}{c|c|c} & \vdots & \\ \hline \dots & B_{\ell \cdot K_c,\, j \cdot N_c} & \dots \\ \hline & \vdots & \\ \end{array}\right),\quad C = \left(\begin{array}{c|c|c} & \vdots & \\ \hline \dots & C_{i \cdot M_c,\, j \cdot N_c} & \dots \\ \hline & \vdots & \\ \end{array}\right)\]

The complete algorithm then reads as:

  • if \(\alpha = 0\) or \(k = 0\)

    • \(C \leftarrow \beta C\)

    • return

  • Allocate buffers \(\overline{A}\) and \(\overline{B}\) for packing blocks of \(A\) and \(B\) respectively.

  • For \(j_b = 0, \dots, \left\lceil \frac{n}{N_c} \right\rceil - 1\)

    • For \(\ell_b = 0, \dots, \left\lceil \frac{k}{K_c} \right\rceil - 1\)

      • Pack: \(\overline{B} \leftarrow B_{\ell_b \cdot K_c,\, j_b \cdot N_c}\)

      • \(\overline{\beta} \leftarrow \begin{cases} 1, & \ell>0 \\ \beta, & \text{else.} \end{cases}\)

      • For \(i_b = 0, \dots, \left\lceil \frac{m}{M_c} \right\rceil - 1\)

        • Pack: \(\overline{A} \leftarrow A_{i_b \cdot M_c,\, \ell_b \cdot K_c}\)

        • Macro Kernel: \(C_{i_b \cdot M_c,\, j_b \cdot N_c} \leftarrow \overline{\beta} \cdot C_{i_b \cdot M_c,\, j_b \cdot N_c} + \alpha \cdot \overline{A} \cdot \overline{B} \)

  • Release buffers