Content |
Fused-Operations
Ist \(\text{BS}\) eine zur Compile-Zeit bekannte Konstante, dann bezeichen wir eine \(m \times \text{BS}\) Matrix als vertikale Matrix-Paneel und eine \(\text{BS} \times n\) Matrix als horizontale Matrix-Paneel.
Doch zuerst: Ungeblockte GEMV-Prozeduren aufräumen
Pack die Implementierungen von dgemv_col und dgemv_row in eine Prozedur dgemv_unblk. Durch eine Fallunterscheidung soll die günstigere Implementierung ausgewählt werden:
void dgemv_unblk(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { if (incRowA<incColA) { // code von dgemv_col } else { // code von dgemv_row } }
GEMV für eine vertikale Matrix-Paneel
Für eine \(m \times \text{BS}\) Matrix \(A\) soll die spezielle GEMV-Operation
\[y \leftarrow y + \alpha A x,\quad x \in \mathbb{M}^{\text{BS}},\; y \in \mathbb{M}^{m}\quad\text{und}\quad \alpha \in \mathbb{M}\]realisiert werden. Hier ist also stets \(\beta = 1\). Dabei soll die Matrix \(A\) zeilenweise betrachtet werden als
\[A = \left(\begin{array}{c} \vec{a}_1^T \\ \vdots \\ \vec{a}_m^T \\ \end{array}\right)\]Die Operation somit ausgeführt werden durch
\[\begin{array}{l}\text{for}\;i=1,\dots,m\\\qquad y_i \leftarrow y_i + \alpha \, \vec{a}_i^T x\end{array}\]Hier soll bei der Berechnung von \(\vec{a}_i^T x\) aber nicht die ddot Operationen verwendet werden. Statt dessen soll ausgenutzt werden, dass die Länge der Vektoren konstant ist (im Sinne von zur Compile-Zeit bekannt). Diese führt zum Algorithmus
\[\begin{array}{l}\text{for}\;i=1,\dots,m\\\qquad \text{dot} \leftarrow 0 \\\qquad \text{for}\;j=1,\dots,\text{BS}\\\qquad \qquad \text{dot} \leftarrow \text{dot} + a_{i,j} \cdot x_j \\\qquad y_i \leftarrow y_i + \alpha \, \text{dot} \\\end{array}\]Für die Implementierung ist folgende Signatur vorgegeben:
#ifndef DGEMV_MxBS #define DGEMV_MxBS 4 #endif void dgemv_mxBS(int m, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double *y, int incY);
Die Konstante \(\text{BS}\) wird durch das Makro DGEMV_MxBS fetsgelegt.
GEMV für eine horizontale Matrix-Paneel
Für eine \(\text{BS} \times n\) Matrix \(A\) soll die GEMV-Operation
\[y \leftarrow \beta\,y + \alpha A x,\quad x \in \mathbb{M}^{\text{n}},\; y \in \mathbb{M}^{\text{BS}}\quad\text{und}\quad \alpha, \beta \in \mathbb{M}\]realisiert werden. Dabei soll die Matrix \(A\) spaltenweise betrachtet werden als
\[A = \left(\vec{a}_1, \dots, \vec{a}_n \right)\]Die Operation kann somit ausgeführt werden durch
\[\begin{array}{l}y \leftarrow \beta\,y \\\text{for}\;j=1,\dots,n\\\qquad y \leftarrow y + \alpha \, \vec{a}_j x_j\end{array}\]Auch hier soll bei der Berechnung ausgenutzt werden, dass die Länge der Vektoren \(y, \vec{a}_1, \dots, \vec{a}_n\) konstant sind. Diese führt zum Algorithmus
\[\begin{array}{l}\text{for}\;i=1,\dots,\text{BS}\\\qquad y_i \leftarrow \beta\, y_i \\\text{for}\;j=1,\dots,n\\\qquad \text{for}\;i=1,\dots,\text{BS}\\\qquad \qquad y_i \leftarrow y_i + \alpha \, a_{i,\,j} \, x_j\end{array}\]Für die Implementierung ist folgende Signatur vorgegeben:
#ifndef DGEMV_BSxN #define DGEMV_BSxN 4 #endif void dgemv_BSxn(int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY);
Die Konstante \(\text{BS}\) wird hier durch das Makro DGEMV_BSxN fetsgelegt.
Geblockte Col-Major-Variante für GEMV
Sei \(A\) wieder eine beliebige \(m \times n\) Matrix. Für die allgemeine GEMV-Operation \(y \leftarrow \beta y + \alpha A x\) soll \(A\) bezüglich \(\text{BS}\) quasi-äquidistant in vertikale Matrix-Paneele aufgeteilt werden, das heißt
\[A = \left(A_1, \dots, A_{n_b} \right),\quad n_b = \left\lceil \frac{n}{\text{BS}} \right\rceil\]Erste Möglichkeit
Für die Dimension der Matrix-Paneele gilt dann
\[A_j \in \mathbb{M}^{m \times bs}\quad\text{mit}\quad\text{bs} =\begin{cases}BS, & 1 \leq j < n_b \quad\text{oder}\quad n \bmod \text{BS} = 0,\\[0.5cm]n \bmod \text{BS}, &\text{sonst}.\end{cases}\]Damit kann folgender Algorithmus zur Berechnung hergeleitet werden:
-
Berechne \(\vec{y} \leftarrow \beta \vec{y}\)
-
Für \(j=1, \dots, \left\lceil \frac{n}{\text{BS}} \right\rceil\)
-
Berechne \(\text{bs}\) wie oben
-
Falls \(\text{bs} = \text{BS}\) gilt
-
Berechne \(\vec{y} \leftarrow \vec{y} + A_j \vec{x}_j\) mit dgemv_mxBS
-
-
sonst
-
Berechne \(\vec{y} \leftarrow \vec{y} + A_j \vec{x}_j\) mit dgemv_unblk
-
-
Zweite Möglichkeit
So ganz blöd ist die Denkweise bei der vorigen Herleitung nicht. Aber in diesem einfachen Fall fähr man leichter, wenn man folgende Indizierung benutzt:
\[A = \left(A_0, A_{\text{BS}}, \dots, A_{\text{BS}k} \right),\quad k = \left\lfloor \frac{n}{\text{BS}} \right\rfloor\]Außerdem werden wir nur eine einzige Paneele \(A_{j\text{BS}}\) gesondert betrachten und dies durch
\[A = \left(\begin{array}{l|l|l} A_0 & A_{j\cdot\text{BS}} & A_{j\cdot\text{BS} + \text{bs}} \end{array}\right),\]notieren. Hier ist:
-
\(A_0\) eine \(m\times j\text{BS}\) Matrix,
-
\(A_{j\cdot\text{BS}}\) eine \(m\times \text{bs}\) Matrix und
-
\(A_{j\cdot\text{BS} + \text{bs}}\) eine \(m \times (n-j\cdot\text{BS} - \text{bs})\) Matrix.
Dabei gilt insbesondere:
\[A_{j\cdot\text{BS}} \in \mathbb{M}^{m \times bs}\quad\text{mit}\quad\text{bs} =\begin{cases}BS, & j\cdot\text{BS} + \text{BS} \leq n\\n - j\text{BS}, &\text{sonst}.\end{cases}\]-
Berechne \(\vec{y} \leftarrow \beta \vec{y}\)
-
Setze \(j \leftarrow 0\)
-
Solange \(j+\text{BS} \leq n\)
-
Berechne \(\vec{y} \leftarrow \vec{y} + A_j \vec{x}_{j}\) mit dgemv_mxBS
-
Setze \(j \leftarrow j + \text{BS}\)
-
-
Berechne \(\vec{y} \leftarrow \vec{y} + A_{j\cdot\text{BS}} \vec{x}_{j\cdot\text{BS}}\) mit dgemv_unblk
Signatur
Für die Implementierung ist folgende Signatur vorgegeben:
void dgemv_blk_col(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY)
Geblockte Row-Major-Variante für GEMV
Die \(m \times n\) Matrix \(A\) soll bezüglich \(\text{BS}\) quasi-äquidistant in horizontale Matrix-Paneele aufgeteilt werden, das heißt
\[A = \left(\begin{array}{c} A_1\\ \vdots \\ A_{m_b} \end{array}\right),\quad m_b = \left\lceil \frac{m}{\text{BS}} \right\rceil\\\]und die Operation unter Verwendung der speziellen GEMV-Operation für horizontale Matrix-Paneele durchgeführt werden.
Für die Implementierung ist folgende Signatur vorgegeben:
void dgemv_blk_row(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY)
Grundgerüst für Test
#include <stdio.h> #include <math.h> //-- setup and print matrices -------------------------------------------------- void initGeMatrix(int m, int n, double *A, int incRowA, int incColA) { int i, j; for (i=0; i<m; ++i) { for (j=0; j<n; ++j) { A[i*incRowA+j*incColA] = i*n + j + 1; } } } void printGeMatrix(int m, int n, const double *A, int incRowA, int incColA) { int i, j; for (i=0; i<m; ++i) { for (j=0; j<n; ++j) { printf("%10.4lf ", A[i*incRowA+j*incColA]); } printf("\n"); } printf("\n\n"); } //-- some BLAS Level 1 procedures and functions -------------------------------- double ddot(int n, const double *x, int incX, const double *y, int incY) { int i; double alpha = 0; for (i=0; i<n; ++i) { alpha += x[i*incX]*y[i*incY]; } return alpha; } void daxpy(int n, double alpha, const double *x, int incX, double *y, int incY) { int i; if (alpha==0) { return; } for (i=0; i<n; ++i) { y[i*incY] += alpha*x[i*incX]; } } void dscal(int n, double alpha, double *x, int incX) { int i; if (alpha==1) { return; } for (i=0; i<n; ++i) { x[i*incX] *= alpha; } } void dcopy(int n, const double *x, int incX, double *y, int incY) { int i; for (i=0; i<n; ++i) { y[i*incY] = x[i*incX]; } } void dswap(int n, double *x, int incX, double *y, int incY) { int i; for (i=0; i<n; ++i) { double tmp; tmp = x[i*incX]; x[i*incX] = y[i*incY]; y[i*incY] = tmp; } } double damax(int n, const double *x, int incX) { int i; double result = 0; for (i=0; i<n; ++i) { if (fabs(x[i*incX])>result) { result = fabs(x[i*incX]); } } return result; } double dgenrm1(int m, int n, const double *A, int incRowA, int incColA) { int i, j; double result = 0; for (j=0; j<n; ++j) { double sum = 0; for (i=0; i<m; ++i) { sum += fabs(A[i*incRowA+j*incColA]); } if (sum>result) { result = sum; } } return result; } //-- Fused BLAS Level 1 procedures and functions ------------------------------- #ifndef DGEMV_MxBS #define DGEMV_MxBS 4 #endif void dgemv_mxBS(int m, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double *y, int incY) { // your code here } #ifndef DGEMV_BSxN #define DGEMV_BSxN 4 #endif void dgemv_BSxn(int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { // your code here } //-- GEMV implementations ------------------------------------------------------ void dgemv_unblk(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { if (incRowA<incColA) { // your code here } else { // your code here } } void dgemv_blk_col(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { // your code here } void dgemv_blk_row(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { // your code here } //------------------------------------------------------------------------------ #ifndef DIM_M #define DIM_M 5 #endif #ifndef DIM_N #define DIM_N 7 #endif #ifndef ROWMAJOR #define ROWMAJOR 0 #endif #if (ROWMAJOR==1) # define INCROW_A DIM_N # define INCCOL_A 1 #else # define INCROW_A 1 # define INCCOL_A DIM_M #endif #ifndef INC_X #define INC_X 1 #endif #ifndef INC_Y #define INC_Y 1 #endif #ifndef ALPHA #define ALPHA 2 #endif #ifndef BETA #define BETA 3 #endif // // A, x, y_ will get initialized with initGeMatrix // double A[DIM_M*DIM_N]; double x[INC_X*DIM_N]; double y_[DIM_M]; // // If BETA is not zero the result of BETA*y + ALPHA * A *x depends on the // initial value of y. We initialize y with y_ before each call of a gemv // implementation. So the results are always the same. // double y[INC_Y*DIM_M]; int main() { initGeMatrix(DIM_M, DIM_N, A, INCROW_A, INCCOL_A); initGeMatrix(DIM_N, 1, x, INC_X, 0); initGeMatrix(DIM_M, 1, y_, 1, 0); printf("A =\n"); printGeMatrix(DIM_M, DIM_N, A, INCROW_A, INCCOL_A); printf("x =\n"); printGeMatrix(DIM_N, 1, x, INC_X, 0); printf("y =\n"); printGeMatrix(DIM_M, 1, y_, INC_Y, 0); // // Aufruf von gemv_unblk // printf("gemv_unblk: %10.4lf * y + %10.4lf * A * x =\n\n", (double)BETA, (double)ALPHA); dcopy(DIM_M, y_, 1, y, INC_Y); dgemv_unblk(DIM_M, DIM_N, ALPHA, A, INCROW_A, INCCOL_A, x, INC_X, BETA, y, INC_Y); printGeMatrix(DIM_M, 1, y, INC_Y, 0); // // Aufruf von gemv_blk_col // printf("gemv_blk_col: %10.4lf * y + %10.4lf * A * x =\n\n", (double)BETA, (double)ALPHA); dcopy(DIM_M, y_, 1, y, INC_Y); dgemv_blk_col(DIM_M, DIM_N, ALPHA, A, INCROW_A, INCCOL_A, x, INC_X, BETA, y, INC_Y); printGeMatrix(DIM_M, 1, y, INC_Y, 0); // // Aufruf von gemv_blk_row // printf("gemv_blk_row: %10.4lf * y + %10.4lf * A * x =\n\n", (double)BETA, (double)ALPHA); dcopy(DIM_M, y_, 1, y, INC_Y); dgemv_blk_row(DIM_M, DIM_N, ALPHA, A, INCROW_A, INCCOL_A, x, INC_X, BETA, y, INC_Y); printGeMatrix(DIM_M, 1, y, INC_Y, 0); return 0; }
Grundgerüst für Benchmark
#include <stdio.h> #include <math.h> #include <stdlib.h> #include <sys/times.h> #include <unistd.h> //-- timer for benchmarks ------------------------------------------------------ double walltime() { struct tms ts; static double ClockTick=0.0; if (ClockTick==0.0) { ClockTick = 1.0 / ((double) sysconf(_SC_CLK_TCK)); } return ((double) times(&ts)) * ClockTick; } //-- setup and print matrices -------------------------------------------------- void initGeMatrix(int m, int n, double *A, int incRowA, int incColA) { int i, j; for (i=0; i<m; ++i) { for (j=0; j<n; ++j) { A[i*incRowA+j*incColA] = i*n + j + 1; } } } void randGeMatrix(int m, int n, double *A, int incRowA, int incColA) { int i, j; for (j=0; j<n; ++j) { for (i=0; i<m; ++i) { A[i*incRowA+j*incColA] = ((double)rand()-RAND_MAX/2)*200/RAND_MAX; } } } void printGeMatrix(int m, int n, const double *A, int incRowA, int incColA) { int i, j; for (i=0; i<m; ++i) { for (j=0; j<n; ++j) { printf("%10.4lf ", A[i*incRowA+j*incColA]); } printf("\n"); } printf("\n\n"); } //-- some BLAS Level 1 procedures and functions -------------------------------- double ddot(int n, const double *x, int incX, const double *y, int incY) { int i; double alpha = 0; for (i=0; i<n; ++i) { alpha += x[i*incX]*y[i*incY]; } return alpha; } void daxpy(int n, double alpha, const double *x, int incX, double *y, int incY) { int i; if (alpha==0) { return; } for (i=0; i<n; ++i) { y[i*incY] += alpha*x[i*incX]; } } void dscal(int n, double alpha, double *x, int incX) { int i; if (alpha==1) { return; } for (i=0; i<n; ++i) { x[i*incX] *= alpha; } } void dcopy(int n, const double *x, int incX, double *y, int incY) { int i; for (i=0; i<n; ++i) { y[i*incY] = x[i*incX]; } } void dswap(int n, double *x, int incX, double *y, int incY) { int i; for (i=0; i<n; ++i) { double tmp; tmp = x[i*incX]; x[i*incX] = y[i*incY]; y[i*incY] = tmp; } } double damax(int n, const double *x, int incX) { int i; double result = 0; for (i=0; i<n; ++i) { if (fabs(x[i*incX])>result) { result = fabs(x[i*incX]); } } return result; } double dgenrm1(int m, int n, const double *A, int incRowA, int incColA) { int i, j; double result = 0; for (j=0; j<n; ++j) { double sum = 0; for (i=0; i<m; ++i) { sum += fabs(A[i*incRowA+j*incColA]); } if (sum>result) { result = sum; } } return result; } //-- error bound for gemv ------------------------------------------------------ // // Vector y0 is the trusted result and vector y1 the vector to be tested. // double err_dgemv(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, const double *y0, int incY0, double *y1, int incY1) { int max_mn = (m>n) ? m : n; double normA = fabs(alpha)*dgenrm1(m, n, A, incRowA, incColA); double normX = damax(n, x, incX); double normY0 = fabs(beta)*damax(m, y0, incY0); double normD; daxpy(m, -1.0, y0, incY0, y1, incY1); normD = damax(n, y1, incY1); return normD/(normA*normX*normY0*max_mn); } //-- Fused BLAS Level 1 procedures and functions ------------------------------- #ifndef DGEMV_MxBS #define DGEMV_MxBS 4 #endif void dgemv_mxBS(int m, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double *y, int incY) { // your code here } #ifndef DGEMV_BSxN #define DGEMV_BSxN 4 #endif void dgemv_BSxn(int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { // your code here } //-- GEMV implementations ------------------------------------------------------ void dgemv_ref(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { int i, j; for (i=0; i<m; ++i) { if (beta==0) { y[i*incY] = 0; } else { y[i*incY] *= beta; } for (j=0; j<n; ++j) { y[i*incY] += alpha*A[i*incRowA+j*incColA]*x[j*incX]; } } } void dgemv_col(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { int j; dscal(m, beta, y, incY); for (j=0; j<n; ++j) { daxpy(m, alpha*x[j*incX], &A[j*incColA], incRowA, y, incY); } } void dgemv_row(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { int i; for (i=0; i<m; ++i) { y[i*incY] *= beta; y[i*incY] += alpha*ddot(n, &A[i*incRowA], incColA, x, incX); } } void dgemv_blk_col(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { // your code here } void dgemv_blk_row(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { // your code here } //------------------------------------------------------------------------------ #ifndef MIN_M #define MIN_M 100 #endif #ifndef MIN_N #define MIN_N 100 #endif #ifndef MAX_M #define MAX_M 8000 #endif #ifndef MAX_N #define MAX_N 8000 #endif #ifndef INC_M #define INC_M 100 #endif #ifndef INC_N #define INC_N 100 #endif #ifndef ROWMAJOR #define ROWMAJOR 0 #endif #if (ROWMAJOR==1) # define INCROW_A MAX_N # define INCCOL_A 1 #else # define INCROW_A 1 # define INCCOL_A MAX_M #endif #ifndef INC_X #define INC_X 1 #endif #ifndef INC_Y #define INC_Y 1 #endif #ifndef ALPHA #define ALPHA 2 #endif #ifndef BETA #define BETA 3 #endif // // A, x, y_ will get initialized with initGeMatrix // double A[MAX_M*MAX_N]; double x[INC_X*MAX_N]; double y_[MAX_M]; // // Each implementation gets its own vector for y: // double y_0[INC_Y*MAX_M]; double y_1[INC_Y*MAX_M]; double y_2[INC_Y*MAX_M]; int main() { int m, n; randGeMatrix(MAX_M, MAX_N, A, INCROW_A, INCCOL_A); randGeMatrix(MAX_N, 1, x, INC_X, 0); randGeMatrix(MAX_M, 1, y_, 1, 0); printf("#%9s %9s %9s %9s", "m", "n", "INCROW_A", "INCCOL_A"); printf(" %9s %9s", "INC_X", "INC_Y"); printf(" %9s %9s", "ALPHA", "BETA"); printf(" %12s %12s", "t", "MFLOPS"); printf(" %12s %12s %12s", "t", "MFLOPS", "err"); printf(" %12s %12s %12s", "t", "MFLOPS", "err"); printf(" %12s %12s %12s", "t", "MFLOPS", "err"); printf(" %12s %12s %12s", "t", "MFLOPS", "err"); printf("\n"); for (m=MIN_M, n=MIN_N; (m<=MAX_M) && (n<=MAX_N); m+=INC_M, n+=INC_N) { int runs = 1; int ops = m*n + m*(n+1); double t0, t1, dt, err; printf(" %9d %9d %9d %9d", m, n, INCROW_A, INCCOL_A); printf(" %9d %9d", INC_X, INC_Y); printf(" %9.2lf %9.2lf", (double)ALPHA, (double)BETA); t0 = 0; runs = 0; do { dcopy(n, y_, 1, y_0, INC_Y); dt = walltime(); dgemv_ref(m, n, ALPHA, A, INCROW_A, INCCOL_A, x, INC_X, BETA, y_0, INC_Y); dt = walltime() - dt; t0 += dt; ++runs; } while (t0<0.3); t0 /= runs; printf(" %12.2e %12.2lf", t0, ops/(1000*1000*t0)); t1 = 0; runs = 0; do { dcopy(n, y_, 1, y_1, INC_Y); dt = walltime(); dgemv_col(m, n, ALPHA, A, INCROW_A, INCCOL_A, x, INC_X, BETA, y_1, INC_Y); dt = walltime() - dt; t1 += dt; ++runs; } while (t1<0.3); t1 /= runs; err = err_dgemv(m, n, ALPHA, A, INCROW_A, INCCOL_A, x, INC_X, BETA, y_0, INC_Y, y_1, INC_Y); printf(" %12.2e %12.2lf %12.2e", t1, ops/(1000*1000*t1), err); t1 = 0; runs = 0; do { dcopy(n, y_, 1, y_2, INC_Y); dt = walltime(); dgemv_row(m, n, ALPHA, A, INCROW_A, INCCOL_A, x, INC_X, BETA, y_2, INC_Y); dt = walltime() - dt; t1 += dt; ++runs; } while (t1<0.3); t1 /= runs; err = err_dgemv(m, n, ALPHA, A, INCROW_A, INCCOL_A, x, INC_X, BETA, y_0, INC_Y, y_2, INC_Y); printf(" %12.2e %12.2lf %12.2e", t1, ops/(1000*1000*t1), err); t1 = 0; runs = 0; do { dcopy(n, y_, 1, y_2, INC_Y); dt = walltime(); dgemv_blk_col(m, n, ALPHA, A, INCROW_A, INCCOL_A, x, INC_X, BETA, y_2, INC_Y); dt = walltime() - dt; t1 += dt; ++runs; } while (t1<0.3); t1 /= runs; err = err_dgemv(m, n, ALPHA, A, INCROW_A, INCCOL_A, x, INC_X, BETA, y_0, INC_Y, y_2, INC_Y); printf(" %12.2e %12.2lf %12.2e", t1, ops/(1000*1000*t1), err); t1 = 0; runs = 0; do { dcopy(n, y_, 1, y_2, INC_Y); dt = walltime(); dgemv_blk_row(m, n, ALPHA, A, INCROW_A, INCCOL_A, x, INC_X, BETA, y_2, INC_Y); dt = walltime() - dt; t1 += dt; ++runs; } while (t1<0.3); t1 /= runs; err = err_dgemv(m, n, ALPHA, A, INCROW_A, INCCOL_A, x, INC_X, BETA, y_0, INC_Y, y_2, INC_Y); printf(" %12.2e %12.2lf %12.2e", t1, ops/(1000*1000*t1), err); printf("\n"); } return 0; }