Packing Blocks of A
In the cache optimized GEMM-Operation \(\beta C + \alpha A B \to C\) the matrix \(A\) gets partitioned into blocks of maximal dimension \(M_c \times K_c\). Each block \(A_{i,l}\) of \(A\) gets packed into col-major horizontal panels with \(M_r\) rows.
Assume that \(p\) is a buffer for \(M_c \cdot K_c\) elements and \(X\) a matrix block with dimension \(m \times k\) where \(m \leq M_c\) and \(k \leq K_c\). Then the following algorithm (which is using zero-based indices) can be used for packing:
dgepack_A(X, p)
|
|
Exercise
-
Implement and test the following algorithm for packing blocks of \(A\)
-
Start with an empty source file!