# GEMV: With fused-vector operations

#### Content

We now consider matrix partitions that contain more than one row or column:

• Row-wise partition with fuse factor $$f = f_\text{gemv_dotf}$$

$A = \begin{pmatrix} \mathbf{A}_0 \\ \hline \mathbf{A}_1 \\ \hline \vdots \\ \hline \mathbf{A}_{m_b-1} \\ \hline \mathbf{A}_{m_b} \end{pmatrix},\quad m_b = \left\lfloor \frac{m}{f} \right\rfloor$

For the dimension of the partition block we then have

$\mathbf{A}_i\in\mathbb{M}^{\,f \,\times\, n}, \quad i=0,\dots,m_b-1$

and

$\mathbf{A}_{m_b}\in\mathbb{M}^{\,\left(m - f\cdot m_b\right) \,\times\, n}$

Note: In case $$m$$ is a multiple of $$f$$ the last block $$A_{m_b}$$ has zero rows.

Accordingly we partition the vector $$y$$ as

$y = \begin{pmatrix} \mathbf{y}_0 \\ \hline \mathbf{y}_1 \\ \hline \vdots \\ \hline \mathbf{y}_{m_b-1} \\ \hline \mathbf{y}_{m_b} \end{pmatrix}.$

For $$y \leftarrow Ax$$ we get

$y \leftarrow \begin{pmatrix} \mathbf{y}_1 \\ \hline \mathbf{y}_2 \\ \hline \vdots \\ \hline \mathbf{y}_{m_b-1} \\ \hline \mathbf{y}_{m_b} \end{pmatrix}= \begin{pmatrix} \mathbf{A}_1 x\\ \hline \mathbf{A}_2 x\\ \hline \vdots \\ \hline \mathbf{A}_{m_b-1} x \\ \hline \mathbf{A}_{m_b} x \end{pmatrix}= Ax$
• Column-wise partition with fuse factor $$f = f_\text{gemv_axpyf}$$:

$A = \left(\begin{array}{c|c|c|c|c} \mathbf{A}_0 & \mathbf{A}_1 & \dots & \mathbf{A}_{n_b-1} & \mathbf{A}_{n_b} \end{array}\right),\quad n_b = \left\lfloor \frac{n}{f} \right\rfloor$

For the dimension of the partition block we then have

$\mathbf{A}_j\in\mathbb{M}^{\,m \,\times\, f}, \quad j=0,\dots,n_b-1$

and

$\mathbf{A}_{n_b}\in\mathbb{M}^{\, m\,\times\, \left(n - f\cdot n_b\right)}.$

Note: In case $$n$$ is a multiple of $$f$$ the last block has zero columns.

Accordingly, we partition the vector $$x$$ as

$x = \begin{pmatrix} \mathbf{x}_0 \\ \hline \mathbf{x}_1 \\ \hline \vdots \\ \hline \mathbf{x}_{n_b-1} \\ \hline \mathbf{x}_{n_b} \end{pmatrix}.$

For $$y \leftarrow Ax$$ this leads to

$y \leftarrow \mathbf{A}_0 \mathbf{x}_0 + \mathbf{A}_1 \mathbf{x}_1 + \dots + \mathbf{A}_{n_b-1} \mathbf{x}_{n_b-1} + \mathbf{A}_{n_b} \mathbf{x}_{n_b}= Ax$

## Algorithm for GEMV

Considering the two kinds of partitions, we get

• $$y \leftarrow \beta\, y$$

• $$\mathbf{if}\; \text{incRow}_\text{A} > \text{incCol}_\text{A}$$

• $$\text{Consider row-wise quasi-equidistant partitions with respect to}\;f:$$

• $A = \begin{pmatrix}\mathbf{A}_0 \\\hline\mathbf{A}_1 \\\hline\vdots \\\hline\mathbf{A}_{m_b-1} \\\hline\mathbf{A}_{m_b}\end{pmatrix}, \quad y = \begin{pmatrix}\mathbf{y}_0 \\\hline\mathbf{y}_1 \\\hline\vdots \\\hline\mathbf{y}_{m_b-1} \\\hline\mathbf{y}_{m_b}\end{pmatrix}, \quad m_b = \left\lfloor \frac{m}{f} \right\rfloor$
• $$\mathbf{for}\; i=0,\dots,m_b-1$$

• $$\mathbf{y}_i \leftarrow \mathbf{y}_i + \alpha\, \mathbf{A}_i x$$

• Consider $$A_{m_b}$$ row wise

• $$\mathbf{for}\; i=m_b \cdot f,\dots,m$$

• $$y_i \leftarrow y_i + \alpha\, a_i^T x$$

• $$\mathbf{else}$$

• $$\text{Consider column-wise quasi-equidistant partitions with respect to}\;f:$$

• $A = \left(\begin{array}{c|c|c|c}\mathbf{A}_0 &\mathbf{A}_1 &\dots &\mathbf{A}_{n_b}\end{array}\right), \quad x = \left(\begin{array}{c}\mathbf{x}_1 \\\mathbf{x}_2 \\\vdots \\\mathbf{x}_{n_b}\end{array}\right), \quad n_b = \left\lfloor \frac{n}{f} \right\rfloor$
• $$\mathbf{for}\; j=0,\dots,n_b-1$$

• $$y \leftarrow y + \alpha\, \mathbf{A}_{j} \mathbf{x}_j$$

• Consider $$A_{n_b}$$ col wise

• $$\mathbf{for}\; j=n_b \cdot f,\dots,n$$

• $$y \leftarrow y + \alpha\, a_{j} x_j$$

## Exercise

• Develop an algorithm for the dotf operation (dot fused):

$\text{dotf}:\; \mathbf{y} \leftarrow \mathbf{y} + \alpha \mathbf{A} \mathbf{x},\quad A \in \mathbb{M}^{f \times n},\; y \in \mathbb{M}^f,\;\mathbf{\alpha} \in \mathbb{M}, \mathbf{x} \in \mathbb{M}^n$

Elements should be accessed column-wise.

• Develop an algorithm for the axpyf operation (axpy fused):

$\text{axpyf}:\; y \leftarrow y + \alpha \, A x\quad A \in \mathbb{M}^{m \times f},x \in \mathbb{M}^f,y \in \mathbb{M}^m$

Elements should be accessed row-wise.

• Implement the above GEMV algorithm with fused vector operations. The fuse factors should be defined through macros:

#ifndef DGEMV_DOTF_FUSE
#define DGEMV_DOTF_FUSE  2
#endif

#ifndef DGEMV_AXPYF_FUSE
#define DGEMV_AXPYF_FUSE  2
#endif

• Use the following signatures for the fused operations:

void
ddotf_ulm(size_t n,
double alpha,
const double *A, ptrdiff_t incRowA, ptrdiff_t incColA,
const double *x, ptrdiff_t incX,
double *y, ptrdiff_t incY)
{