Packing Blocks of B
In the cache optimized GEMM-Operation \(\beta C + \alpha A B \to C\) the matrix \(A\) gets partitioned into blocks of maximal dimension \(M_c \times K_c\). Each block \(B_{l,j}\) of \(B\) gets packed into row-major vertical panels with \(N_r\) columns.
Exercise
Assume that \(p\) is a buffer for \(K_c \cdot N_c\) elements and \(X\) a matrix block with dimension \(k \times n\) where \(k \leq K_c\) and \(n \leq N_c\).
Be honest for your own sake and derive the following algorithm (or any equivalent algorithm) for packing blocks of \(B\):
dgepack_B(X, p)
|
|
Exercise
-
Implement and test the following algorithm for packing blocks of \(A\)
-
Start with an empty source file!