# Packing blocks of B

## Partitioning of matrix $$B$$

Matrix $$B$$ has dimensions $$k \times n$$ and will be partitioned with respect to some constants $$K_c$$ and $$N_c$$. This means that $$B$$ is seen as a $$k_b \times n_b$$ block-matrix:

$B = \left(\begin{array}{c|c|c} B_{0,0} & \dots & B_{0,n_b-1} \\ \hline \vdots & & \vdots \\ \hline B_{k_b-1,0} & \dots & B_{k_b-1,n_b-1} \\ \end{array}\right),\quad k_b = \left\lceil \frac{k}{K_c} \right\rceil,\; n_b = \left\lceil \frac{n}{N_c} \right\rceil.$

Each block has maximal dimensions $$K_c \times N_c$$. However, blocks at the bottom or on the right border can be smaller. More precisely: For a block

$B_{i,j} \in \mathbb{M}^{K\times N} \quad (0 \leq i < k_b, \; 0 \leq j < n_b)$

dimensions are given by

$K = \begin{cases} K_c, & i < k_b-1 \;\; \lor \;\; K_c \,|\, k, \\ k \bmod K_c, & \text{else}, \end{cases}$

and

$N = \begin{cases} N_c, & j < n_b-1 \;\; \lor \;\;N_c \,|\, n, \\ n \bmod N_c, & \text{else}. \end{cases}$

## Packing blocks of $$B$$

Blocks of $$B$$ are partitioned into vertical panels with width $$N_r$$. Besides the different blocking factor $$N_c$$ (and different matrix dimension) packing of $$B$$ can be done with the previous packing operation applied to $$B^T$$.

## Exercise

Extend the code to make it work:

#include <cassert>
#include <cstddef>
#include <printf.hpp>

namespace tools {

void
initMatrix(std::size_t m, std::size_t n,
double *A, std::ptrdiff_t incRowA, std::ptrdiff_t incColA)
{
for (std::size_t j=0; j<n; ++j) {
for (std::size_t i=0; i<m; ++i) {
A[i*incRowA+j*incColA] = i*n + j + 1;
}
}
}

void
printMatrix(std::size_t m, std::size_t n,
const double *A, std::ptrdiff_t incRowA, std::ptrdiff_t incColA)
{
for (std::size_t i=0; i<m; ++i) {
for (std::size_t j=0; j<n; ++j) {
fmt::printf("%6.2lf ", A[i*incRowA+j*incColA]);
}
fmt::printf("\n");
}
fmt::printf("\n");
}

struct DoubleArray
{
DoubleArray(std::size_t n)
: ptr(new double[n])
{
assert(ptr);
}

~DoubleArray()
{
delete [] ptr;
}

operator double *() const
{
return ptr;
}

double * const ptr;

};

} // namespace tools
//------------------------------------------------------------------------------

namespace ulmblas {

void
pack_A(std::size_t M, std::size_t K,
const double *A, std::ptrdiff_t incRowA, std::ptrdiff_t incColA,
std::size_t M_R,
double *p)
{
}

void
pack_B(std::size_t K, std::size_t N,
const double *B, std::ptrdiff_t incRowB, std::ptrdiff_t incColB,
std::size_t N_R,
double *p)
{
}

} // namespace ulmblas

//------------------------------------------------------------------------------

#ifndef DIM_K
#define DIM_K 11
#endif

#ifndef DIM_N
#define DIM_N 10
#endif

#ifndef COLMAJOR_B
#define COLMAJOR_B 1
#endif

std::size_t K_C = 4;
std::size_t N_C = 8;
std::size_t N_R = 4;

//------------------------------------------------------------------------------

int
main()
{
// For the moment we will not use the matrix class from session 7
// 1) Allocate a k x n matrix B
std::size_t         k = DIM_K;
std::size_t         n = DIM_N;
tools::DoubleArray  B(k*n);
std::ptrdiff_t      incRowB = (COLMAJOR_B) ? 1 : n;
std::ptrdiff_t      incColB = (COLMAJOR_B) ? k : 1;

// 2) Initialize matrix B
tools::initMatrix(k, n, B, incRowB, incColB);

// 3) Print dimensions of B and content of B
fmt::printf("k = %zu, n = %zu\n", k, n);
fmt::printf("B = \n");
tools::printMatrix(k, n, B, incRowB, incColB);

fmt::printf("K_C = %zu, N_C = %zu, N_R = %zu\n", K_C, N_C, N_R);

// 4) Allocate a buffer p of size K_C * N_C
tools::DoubleArray  p(K_C*N_C);

std::size_t k_b = (k+K_C-1) / K_C;
std::size_t n_b = (n+N_C-1) / N_C;

fmt::printf("B is partitioned into a %zu x %zu block matrix\n", k_b, n_b);

for (std::size_t i=0; i<k_b; ++i) {
std::size_t K = /* TODO: your code */

for (std::size_t j=0; j<n_b; ++j) {
std::size_t N = /* TODO: your code */

fmt::printf("B_{%zu,%zu} is a %zu x %zu matrix\n", i, j, K, N);

// 5) Print the content of the matrix block B_{i,j}
fmt::printf("B_{%zu,%zu} = \n", i, j);