GEMV: Blockweise Berechnung mit Fused-Vector-Operations

In der Vorlesung haben wir uns überlegt, dass es vorteilhaft sein könnte bei der GEMV Operatione mehr als nur eine Zeile oder Spalte von \(A\) in den Cache zu laden und mit \(x\) oder Komponenten von \(x\) zu multiplizieren. Dabei betrachten wir für die Operation folgende Partitionierungen von \(A\):

Partitionierung der Zeilen bezüglich dem Fuse-Faktor \(f = f_\text{gemv_dotf}\)
\[A = \begin{pmatrix} \mathbf{A}_1 \\ \hline \mathbf{A}_2 \\ \hline \vdots \\ \hline \mathbf{A}_{m_b} \end{pmatrix},\quad m_b = \left\lceil \frac{m}{f} \right\rceil\]
Für die Dimensionen der Blöcke gilt dann
\[\mathbf{A}_i\in\mathbb{M}^{\,f \,\times\, n}, \quad i=1,\dots,m_b-1\]
und
\[\mathbf{A}_{m_b}\in\mathbb{M}^{\,\left(m - f\cdot m_b\right) \,\times\, n}\]
Damit das Matrix-Vektor Produkt bezüglich dieser Partitionierung definiert ist, muss der Vektor \(y\) analog partitioniert werden:
\[y = \begin{pmatrix} \mathbf{y}_1 \\ \hline \mathbf{y}_2 \\ \hline \vdots \\ \hline \mathbf{y}_{m_b} \end{pmatrix}.\]
Für die Gleichung \(y = Ax\) gilt dann
\[y = \begin{pmatrix} \mathbf{y}_1 \\ \hline \mathbf{y}_2 \\ \hline \vdots \\ \hline \mathbf{y}_{m_b} \end{pmatrix}= \begin{pmatrix} \mathbf{A}_1 x\\ \hline \mathbf{A}_2 x\\ \hline \vdots \\ \hline \mathbf{A}_{m_b} x \end{pmatrix}= Ax\]
Partitionierung der Spalten bezüglich dem Fuse-Faktor \(f = f_\text{gemv_axpyf}\)
\[A = \left(\begin{array}{c|c|c|c} \mathbf{A}_1 & \mathbf{A}_2 & \dots & \mathbf{A}_{n_b} \end{array}\right),\quad n_b = \left\lceil \frac{n}{f} \right\rceil\]
Für die Dimensionen der Blöcke gilt dann
\[\mathbf{A}_j\in\mathbb{M}^{\,m \,\times\, f}, \quad j=1,\dots,n_b-1\]
und
\[\mathbf{A}_{n_b}\in\mathbb{M}^{\, m\,\times\, \left(m - f\cdot m_b\right)}\]
Damit das Matrix-Vektor Produkt bezüglich dieser Partitionierung definiert ist, muss in diesem Fall der Vektor \(x\) analog partitioniert werden:
\[x = \begin{pmatrix} \mathbf{x}_1 \\ \hline \mathbf{x}_2 \\ \hline \vdots \\ \hline \mathbf{x}_{n_b} \end{pmatrix}.\]
Für die Gleichung \(y = Ax\) gilt dann
\[y= \mathbf{A}_1 \mathbf{x}_1 + \mathbf{A}_2 \mathbf{x}_2 + \dots + \mathbf{A}_{n_b} \mathbf{x}_{n_b}= Ax\]

Algorithmus für GEMV

Mit den obigen Überlegungen leitet man sich folgenden Algorithmus her:

\(y \leftarrow \beta\, y\)
\(\text{If}\; \text{incRow}_\text{A} > \text{incCol}_\text{A}\)
- \(\text{Consider row-wise quasi-equidistant partitions with respect to}\;f:\)
  - \[A = \begin{pmatrix}\mathbf{A}_1 \\\hline\mathbf{A}_2 \\\hline\vdots \\\hline\mathbf{A}_{m_b}\end{pmatrix}, \quad y = \begin{pmatrix}\mathbf{y}_1 \\\hline\mathbf{y}_2 \\\hline\vdots \\\hline\mathbf{y}_{m_b}\end{pmatrix}, \quad m_b = \left\lceil \frac{m}{f} \right\rceil\]
- \(\text{For}\; i=1,\dots,m_b\)
  - \( \mathbf{y}_i \leftarrow \mathbf{y}_i + \alpha\, \mathbf{A}_i x \)
\(\text{Else}\)
- \(\text{Consider col-wise quasi-equidistant partitions with respect to}\;f:\)
  - \[A = \left(\begin{array}{c|c|c|c}\mathbf{A}_1 &\mathbf{A}_2 &\dots &\mathbf{A}_{n_b}\end{array}\right), \quad x = \left(\begin{array}{c}\mathbf{x}_1 \\\mathbf{x}_2 \\\vdots \\\mathbf{x}_{n_b}\end{array}\right), \quad n_b = \left\lceil \frac{n}{f} \right\rceil\]
- \(\text{For}\; j=1,\dots,n_b\)
  - \( y \leftarrow y + \alpha\, \mathbf{A}_{j} \mathbf{x}_j \)

Aufgaben

Entwickelt für die dotf Operation
\[\text{dotf}:\; \mathbf{\alpha} \leftarrow X\, y,\quad X \in \mathbb{M}^{m \times n},\; y \in \mathbb{M}^m,\;\mathbf{\alpha} \in \mathbb{M}^n\]
einen Algorithmus, der für den Fall einer zeilenweise gespeicherten Matrix \(X\) Cache-freundlich ist. Dabei soll davon ausgegangen werden, dass \(m\) klein ist. Und zwar wird \(m\) durch einen konstanten Fuse-Faktor beschränkt sein.
Entwickelt für die axpyf Operation
\[\text{axpyf}:\; y \leftarrow y + \alpha \, X z\quad X \in \mathbb{M}^{m \times n},z \in \mathbb{M}^ny \in \mathbb{M}^m\]
einen Algorithmus, der für den Fall einer spaltenweise gespeicherten Matrix \(X\) Cache-freundlich ist. Dabei soll davon ausgegangen werden, dass \(n\) klein ist. Und zwar wird \(n\) durch einen konstanten Fuse-Faktor beschränkt sein.

Implementiert den obigen GEMV Algorithmus mit Fused-Vector-Operations. Der Fuse-Factor soll durch Macros definiert werden:

        #ifndef DGEMV_DOTF_FUSE
        #define DGEMV_DOTF_FUSE  2
        #endif
        
        #ifndef DGEMV_AXPYF_FUSE
        #define DGEMV_AXPYF_FUSE  2
        #endif