Dreieckslöser
Für eine untere \(n\times n\) Dreiecksmatrix \(L\) soll mit der Funktion dtrlsv (double triangular lower solve vector) die Operation
\[x \leftarrow L^{-1} x\]durchgeführt werden. Der Vektor \(x\) soll also mit der Lösung \(L^{-1} x\) überschrieben werden. Bei der Herleitung unterscheiden wir aber formal zwischen dem Vektor vor und nach der Operation und definieren
\[z := L^{-1} x\]als Lösung des Problems. Dann muss nach entsprechender Partitionierung von \(x\) und \(z\) gelten:
\[\left(\begin{array}{c} x_0 \\ \hline x_k \end{array}\right)=\left(\begin{array}{c|c} L_{0,0} & \\ \hline L_{k,0} & L_{k,k} \end{array}\right)\left(\begin{array}{c} z_0 \\ \hline z_k \end{array}\right)=\left(\begin{array}{c} L_{0,0} z_0 \\ \hline L_{k,0} z_0 + L_{k,k} z_k \end{array}\right)\quad\Longleftrightarrow\quad\left\{\begin{array}{lcl} z_0 &=& L_{0,0}^{-1} \; x_0 \\ L_{k,k} z_k &=& x_k - L_{k,0}\, z_0 \end{array}\right.\]Wir leiten daraus zwei Varianten her, mit denen die Operation iterativ mit den Zwischenergebnissen
\[x = x^{(0)}\leadsto x^{(1)}\leadsto\dots x^{(n)} = z\]berechnet wird:
-
In der ersten Variante (Row-Major) soll für \(x^{(k)}\) gelten:
\[x^{(k)} =\left(\begin{array}{c} x_0^{(k)} \\ \hline x_k^{(k)} \end{array}\right)=\left(\begin{array}{c} L_{0,0}^{-1} \; x_0 \\ \hline x_k\end{array}\right)=\left(\begin{array}{c} z_0 \\ \hline x_k\end{array}\right)\] -
In der zweiten Variante (Col-Major) soll für \(x^{(k)}\) gelten:
\[x^{(k)} =\left(\begin{array}{c} x_0^{(k)} \\ \hline x_k^{(k)} \end{array}\right)=\left(\begin{array}{c} L_{0,0}^{-1} x_0 \\ \hline x_k - L_{k,0}\, z_0 \end{array}\right)=\left(\begin{array}{c} z_0 \\ \hline x_k - L_{k,0}\, z_0\end{array}\right)\]
Dass die Bezeichnungen Row-Major und Col-Major berechtigt sind, wird allerdings erst im Nachhinein deutlich.
Row-Major (Unblocked)
Wir betrachten die gemeinsamen Partitionierungen
\[ x^{(k)} = \left(\begin{array}{c} x_0^{(k)} \\ \hline x_k^{(k)} \\ \hdashline x_{k+1}^{(k)} \end{array}\right) \qquad\text{und}\qquad x^{(k+1)} = \left(\begin{array}{c} x_0^{(k+1)} \\ \hdashline x_k^{(k+1)} \\ \hline x_{k+1}^{(k+1)} \end{array}\right)\]sowie
\[L = \left(\begin{array}{c|c|c} L_{0,0} & & \\ \hline \ell_{k,0}^T & \ell_{k,k} & \\ \hline L_{k+1,0} & \ell_{k+1,k} & L_{k+1,k+1} \\ \end{array}\right)\]Für die Durchführung einer Iteration rechnen wir nun nach:
-
Laut Voraussetzung gilt für \(x^{(k)}\):
\[L_{0,0} x_0^{(k)} = x_0\qquad\Longleftrightarrow\qquad x_0^{(k)} = z_0\] -
Für \(x^{(k+1)}\) soll nach der Iteration
\[\begin{array}{lclcl}L_{0,0} x_0^{(k+1)} & & &=& x_0 \\\ell_{k,0}^T x_0^{(k+1)} &+& \ell_{k,k} x_k^{(k+1)} &=& x_k\end{array}\]gelten.
Offensichtlich ist also \(x_0^{(k+1)} = x_0^{(k)} = z_0\) und
\[x_k^{(k+1)} = \frac{x_k - \ell_{k,0}^T x_0^{(k+1)}}{\ell_{k,k}}.\]Um \(x\) mit \(z\) zu überschreiben, erhalten wir als Verfahren
-
For \(k=0, \dots, n-1\):
-
\(x_k \leftarrow x_k - \ell_{k,0}^T x_0\) (DOT)
-
\(x_k \leftarrow \frac{x_k}{\ell_{k,k}}\)
-
Wie es die Prophezeiung besagte, wird hier nur auf Zeilen von \(L\) zugegriffen.
Col-Major (Unblocked)
-
Laut Voraussetzung gilt für \(x^{(k)}\):
\[L_{0,0} x_0^{(k)} = x_0\qquad\Longleftrightarrow\qquad x_0^{(k)} = z_0\]und
\[\left(\begin{array}{c} x_k^{(k)} \\ \hdashline x_{k+1}^{(k)} \end{array}\right)=\left(\begin{array}{c} x_k \\ \hdashline x_{k+1} \end{array}\right)-\left(\begin{array}{c} \ell_{k,0}^T \\ \hdashline L_{k+1,0} \end{array}\right) z_0\qquad\Longleftrightarrow\qquad\left\{\begin{array}{lclcl} x_k^{(k)} &=& x_k &-& \ell_{k,0}^T z_0 \\ x_{k+1}^{(k)} &=& x_{k+1} &-& L_{k+1,0} z_0 \end{array}\right.\] -
Für \(x^{(k+1)}\) soll nach der Iteration
\[\left(\begin{array}{c:c} L_{0,0} & \\ \hdashline \ell_{k,0}^T & \ell_{k,k} \\ \end{array}\right)\left(\begin{array}{c} x_0^{(k+1)} \\ \hdashline x_k^{(k+1)} \end{array}\right)=\left(\begin{array}{c} x_0 \\ \hdashline x_k \end{array}\right)\qquad\Longleftrightarrow\qquad\left\{\begin{array}{lclcl} L_{0,0} x_0^{(k+1)} & & &=& x_0 \\ \ell_{k,0}^T x_0^{(k+1)} &+& \ell_{k,k} x_k^{(k+1)} &=& x_k \end{array}\right.\qquad\Longleftrightarrow\qquad\left\{\begin{array}{lcl} x_0^{(k+1)} &=& z_0 \\ \ell_{k,k} x_k^{(k+1)} &=& x_k - \ell_{k,0}^T z_0 \end{array}\right.\]und
\[\begin{array}{lcl}x_{k+1}^{(k+1)}&=&x_{k+1}-\left(\begin{array}{c:c} L_{k+1,0} & \ell_{k+1,k} \end{array}\right)\left(\begin{array}{c} x_0^{(k+1)}\\ \hdashline x_k^{(k+1)} \end{array}\right)\\&=& x_{k+1} - L_{k+1,0} x_0^{(k+1)} - \ell_{k+1,k} x_k^{(k+1)}\\&=& x_{k+1} - L_{k+1,0} z_0 - \ell_{k+1,k} x_k^{(k+1)}\\&=& x_{k+1}^{(k)} - \ell_{k+1,k} x_k^{(k+1)}\\\end{array}\]gelten.
Wegen \(x_0^{(k+1)} = x_0^{(k)} = z_0\) muss also nur
\[\begin{array}{lcl}x_k^{(k+1)} &=& \frac{1}{\ell_{k,k}}\left(x_k - \ell_{k,0}^T z_0\right) = \frac{1}{\ell_{k,k}} x_k^{(k)} \\x_{k+1}^{(k+1)} &=& x_{k+1}^{(k)} - \ell_{k+1,k} x_k^{(k+1)}\end{array}\]berechnet werden. Als Algorithmus erhalten wir somit:
-
For \(k=0, \dots, n-1\):
-
\(x_k \leftarrow \frac{1}{\ell_{k,k}} x_k\)
-
\(x_{k+1} \leftarrow x_{k+1} - \ell_{k+1,k} x_k\) (AXPY)
-
Aufgabe
Implementiert die Prozeduren dtrlsv_row und dtrlsv_col in der untenstehenden Vorlage.
-
Testet eure Implementierung jeweils für zeilen- und spaltenweise gespeicherte Matrizen. Übersetzt dazu wiederum das Programm jeweils mit -DROWMAJOR=1 und -DROWMAJOR=0.
-
Verwendet verschiedene Optimierungen: -O1, -O3 und -Ofast.
#include <stdio.h> #include <math.h> #include <stdlib.h> #include <sys/times.h> #include <unistd.h> //-- timer for benchmarks ------------------------------------------------------ double walltime() { struct tms ts; static double ClockTick=0.0; if (ClockTick==0.0) { ClockTick = 1.0 / ((double) sysconf(_SC_CLK_TCK)); } return ((double) times(&ts)) * ClockTick; } //-- setup and print matrices -------------------------------------------------- void initGeMatrix(int m, int n, double *A, int incRowA, int incColA) { int i, j; for (i=0; i<m; ++i) { for (j=0; j<n; ++j) { A[i*incRowA+j*incColA] = i*n + j + 1; } } } void randGeMatrix(int m, int n, double *A, int incRowA, int incColA) { int i, j; for (j=0; j<n; ++j) { for (i=0; i<m; ++i) { A[i*incRowA+j*incColA] = ((double)rand()-RAND_MAX/2)*200/RAND_MAX; } } } void makeTrlDiagDom(int n, int unitDiag, double *A, int incRowA, int incColA) { int i, j; for (j=0; j<n; ++j) { double asum = 0; double A_jj = (unitDiag) ? 1 : A[j*(incRowA+incColA)]; for (i=j+1; i<n; ++i) { asum += fabs(A[i*incRowA+j*incColA]); } for (i=j+1; i<n; ++i) { A[i*incRowA+j*incColA] *= A_jj/(asum*1.1); } } } void printGeMatrix(int m, int n, const double *A, int incRowA, int incColA) { int i, j; for (i=0; i<m; ++i) { for (j=0; j<n; ++j) { printf("%10.4lf ", A[i*incRowA+j*incColA]); } printf("\n"); } printf("\n\n"); } void printTrMatrix(int m, int n, int unitDiag, int lower, const double *A, int incRowA, int incColA) { int i, j; for (i=0; i<m; ++i) { for (j=0; j<n; ++j) { if (unitDiag && (i==j)) { printf("%10.4lf ", 1.0); } else if ((lower && (i>=j)) || (!lower && (i<=j))) { printf("%10.4lf ", A[i*incRowA+j*incColA]); } else { printf("%10.4lf ", 0.0); } } printf("\n"); } printf("\n\n"); } //-- some BLAS Level 1 procedures and functions -------------------------------- double ddot(int n, const double *x, int incX, const double *y, int incY) { int i; double alpha = 0; for (i=0; i<n; ++i) { alpha += x[i*incX]*y[i*incY]; } return alpha; } void daxpy(int n, double alpha, const double *x, int incX, double *y, int incY) { int i; if (alpha==0) { return; } for (i=0; i<n; ++i) { y[i*incY] += alpha*x[i*incX]; } } void dscal(int n, double alpha, double *x, int incX) { int i; if (alpha==1) { return; } for (i=0; i<n; ++i) { x[i*incX] *= alpha; } } void dcopy(int n, const double *x, int incX, double *y, int incY) { int i; for (i=0; i<n; ++i) { y[i*incY] = x[i*incX]; } } void dswap(int n, double *x, int incX, double *y, int incY) { int i; for (i=0; i<n; ++i) { double tmp; tmp = x[i*incX]; x[i*incX] = y[i*incY]; y[i*incY] = tmp; } } double damax(int n, const double *x, int incX) { int i; double result = 0; for (i=0; i<n; ++i) { if (fabs(x[i*incX])>result) { result = fabs(x[i*incX]); } } return result; } double dgenrm1(int m, int n, const double *A, int incRowA, int incColA) { int i, j; double result = 0; for (j=0; j<n; ++j) { double sum = 0; for (i=0; i<m; ++i) { sum += fabs(A[i*incRowA+j*incColA]); } if (sum>result) { result = sum; } } return result; } double dtrnrm1(int m, int n, int unitDiag, int lower, const double *A, int incRowA, int incColA) { int i, j; double result = 0; for (j=0; j<n; ++j) { double sum = 0; for (i=0; i<m; ++i) { if (unitDiag && (i==j)) { sum += 1.0; } else if ((lower && (i>=j)) || (!lower && (i<=j))) { sum += fabs(A[i*incRowA+j*incColA]); } } if (sum>result) { result = sum; } } return result; } //-- error bounds -------------------------------------------------------------- double err_dgemv(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, const double *y0, int incY0, double *y1, int incY1) { int max_mn = (m>n) ? m : n; double normA = fabs(alpha)*dgenrm1(m, n, A, incRowA, incColA); double normX = damax(n, x, incX); double normY0 = fabs(beta)*damax(m, y0, incY0); double normD; daxpy(m, -1.0, y0, incY0, y1, incY1); normD = damax(n, y1, incY1); return normD/(normA*normX*normY0*max_mn); } double err_dtrmv(int n, int unitDiag, int lower, const double *A, int incRowA, int incColA, const double *x0, int incX0, double *x1, int incX1) { double normA = dtrnrm1(n, n, unitDiag, lower, A, incRowA, incColA); double normX0 = damax(n, x0, incX0); double normD; daxpy(n, -1.0, x0, incX0, x1, incX1); normD = damax(n, x1, incX1); return normD/(n*normA*normX0); } double err_dtrsv(int n, int unitDiag, int lower, const double *A, int incRowA, int incColA, const double *x0, int incX0, double *x1, int incX1) { double normA = dtrnrm1(n, n, unitDiag, lower, A, incRowA, incColA); double normX0 = damax(n, x0, incX0); double normD; daxpy(n, -1.0, x0, incX0, x1, incX1); normD = damax(n, x1, incX1); return normD/(n*normA*normX0); } //-- Fused BLAS Level 1 procedures and functions ------------------------------- #ifndef DGEMV_MxBS #define DGEMV_MxBS 4 #endif void dgemv_mxBS(int m, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double *y, int incY) { int i, j; if (alpha==0) { return; } for (i=0; i<m; ++i) { double dot = 0; for (j=0; j<DGEMV_MxBS; ++j) { dot += A[i*incRowA+j*incColA]*x[j*incX]; } y[i*incY] += alpha*dot; } } #ifndef DGEMV_BSxN #define DGEMV_BSxN 4 #endif void dgemv_BSxn(int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { int i, j; if (alpha==0) { return; } if (beta!=1) { for (i=0; i<DGEMV_BSxN; ++i) { y[i*incY] *= beta; } } for (j=0; j<n; ++j) { for (i=0; i<DGEMV_BSxN; ++i) { y[i*incY] += alpha*A[i*incRowA+j*incColA]*x[j*incX]; } } } //-- GEMV implementations ------------------------------------------------------ void dgemv_ref(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { int i, j; for (i=0; i<m; ++i) { if (beta==0) { y[i*incY] = 0; } else { y[i*incY] *= beta; } for (j=0; j<n; ++j) { y[i*incY] += alpha*A[i*incRowA+j*incColA]*x[j*incX]; } } } void dgemv_unblk(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { if (incRowA<incColA) { int j; dscal(m, beta, y, incY); for (j=0; j<n; ++j) { daxpy(m, alpha*x[j*incX], &A[j*incColA], incRowA, y, incY); } } else { int i; for (i=0; i<m; ++i) { y[i*incY] *= beta; y[i*incY] += alpha*ddot(n, &A[i*incRowA], incColA, x, incX); } } } void dgemv_ulm(int m, int n, double alpha, const double *A, int incRowA, int incColA, const double *x, int incX, double beta, double *y, int incY) { if (incRowA<incColA) { int bs = DGEMV_MxBS; int j; dscal(m, beta, y, incY); for (j=0; j+bs<=n; j+=bs) { dgemv_mxBS(m, alpha, &A[j*incColA], incRowA, incColA, &x[j*incX], incX, y, incY); } dgemv_unblk(m, n-j, alpha, &A[j*incColA], incRowA, incColA, &x[j*incX], incX, 1.0, y, incY); } else { int bs = DGEMV_BSxN; int i; for (i=0; i+bs<=m; i+=bs) { dgemv_BSxn(n, alpha, &A[i*incRowA], incRowA, incColA, x, incX, beta, &y[i*incY], incY); } dgemv_unblk(m-i, n, alpha, &A[i*incRowA], incRowA, incColA, x, incX, beta, &y[i*incY], incY); } } //-- TRMV implementations ------------------------------------------------------ void dtrlmv_ref(int n, int unitDiag, const double *A, int incRowA, int incColA, double *x, int incX) { int i, j; for (j=n-1; j>=0; --j) { for (i=j+1; i<n; ++i) { x[i*incX] += A[i*incRowA+j*incColA]*x[j*incX]; } if (!unitDiag) { x[j*incX] *= A[j*(incRowA+incColA)]; } } } void dtrlmv_col(int n, int unitDiag, const double *A, int incRowA, int incColA, double *x, int incX) { int k; if (unitDiag) { for (k=n-1; k>=0; --k) { daxpy(n-k-1, x[k*incX], &A[(k+1)*incRowA+k*incColA], incRowA, &x[(k+1)*incX], incX); } } else { for (k=n-1; k>=0; --k) { daxpy(n-k-1, x[k*incX], &A[(k+1)*incRowA+k*incColA], incRowA, &x[(k+1)*incX], incX); x[k*incX] *= A[k*(incRowA+incColA)]; } } } void dtrlmv_row(int n, int unitDiag, const double *A, int incRowA, int incColA, double *x, int incX) { int k; if (unitDiag) { for (k=n-1; k>=0; --k) { x[k*incX] += ddot(k, &A[k*incRowA], incColA, x, incX); } } else { for (k=n-1; k>=0; --k) { x[k*incX] = ddot(k+1, &A[k*incRowA], incColA, x, incX); } } } void dtrlmv_ulm(int n, int unitDiag, const double *A, int incRowA, int incColA, double *x, int incX) { if (incRowA<incColA) { int bs = DGEMV_MxBS; int k = (n/bs)*bs; dtrlmv_col(n-k, unitDiag, &A[k*(incRowA+incColA)], incRowA, incColA, &x[k*incX], incX); for (k-=bs; k>=0; k-=bs) { dgemv_mxBS(n-k-bs, 1.0, &A[(k+bs)*incRowA+k*incColA], incRowA, incColA, &x[k*incX], incX, &x[(k+bs)*incX], incX); dtrlmv_col(bs, unitDiag, &A[k*(incRowA+incColA)], incRowA, incColA, &x[k*incX], incX); } } else { double buffer[DGEMV_BSxN]; int bs = DGEMV_BSxN; int k = (n/bs)*bs; dgemv_ulm(n-k, k, 1.0, &A[k*incRowA], incRowA, incColA, x, incX, 0.0, buffer,1); dtrlmv_row(n-k, unitDiag, &A[k*(incRowA+incColA)], incRowA, incColA, &x[k*incX], incX); daxpy(n-k, 1.0, buffer, 1, &x[k*incX], incX); for (k-=bs; k>=0; k-=bs) { dgemv_BSxn(k, 1.0, &A[k*incRowA], incRowA, incColA, x, incX, 0.0, buffer, 1); dtrlmv_col(bs, unitDiag, &A[k*(incRowA+incColA)], incRowA, incColA, &x[k*incX], incX); daxpy(bs, 1.0, buffer, 1, &x[k*incX], incX); } } } //-- TRSV implementations ------------------------------------------------------ void dtrlsv_ref(int n, int unitDiag, const double *A, int incRowA, int incColA, double *x, int incX) { int i, j; for (i=0; i<n; ++i) { for (j=0; j<i; ++j) { x[i*incX] -= A[i*incRowA+j*incColA]*x[j*incX]; } x[i*incX] /= (unitDiag) ? 1.0 : A[i*(incRowA+incColA)]; } } void dtrlsv_row(int n, int unitDiag, const double *A, int incRowA, int incColA, double *x, int incX) { // ... your code here .... } void dtrlsv_col(int n, int unitDiag, const double *A, int incRowA, int incColA, double *x, int incX) { // ... your code here .... } //-- Wrapper for the Intel MKL ------------------------------------------------- #ifdef USE_MKL #include <mkl_types.h> // pre declaration so we don't need to include the complete mkl_blas header. void dgemv(const char *trans, const MKL_INT *m, const MKL_INT *n, const double *alpha, const double *A, const MKL_INT *ldA, const double *x, const MKL_INT *incX, const double *beta, double *y, const MKL_INT *incY); void dgemv_mkl(MKL_INT m, MKL_INT n, double alpha, const double *A, MKL_INT incRowA, MKL_INT incColA, const double *x, MKL_INT incX, double beta, double *y, MKL_INT incY) { MKL_INT ldA = (incRowA==1) ? incColA : incRowA; char trans = (incRowA==1) ? 'N' : 'T'; MKL_INT M = (incRowA==1) ? m : n; MKL_INT N = (incRowA==1) ? n : m; dgemv(&trans, &M, &N, &alpha, A, &ldA, x, &incX, &beta, y, &incY); } void dtrmv(const char *uplo, const char *transa, const char *diag, const MKL_INT *n, const double *a, const MKL_INT *lda, double *b, const MKL_INT *incx); void dtrmv_mkl(int lower, int unitDiag, MKL_INT n, const double *A, int incRowA, int incColA, double *x, int incX) { MKL_INT ldA = (incRowA==1) ? incColA : incRowA; dtrmv(&N, A, &ldA, x, &incX); } #endif // USE_MKL //------------------------------------------------------------------------------ #ifndef MIN_N #define MIN_N 100 #endif #ifndef MAX_N #define MAX_N 8000 #endif #ifndef INC_N #define INC_N 100 #endif #ifndef ROWMAJOR #define ROWMAJOR 0 #endif #ifndef UNITDIAG #define UNITDIAG 0 #endif #if (ROWMAJOR==1) # define INCROW_A MAX_N # define INCCOL_A 1 #else # define INCROW_A 1 # define INCCOL_A MAX_N #endif #ifndef INC_X #define INC_X 1 #endif double A[MAX_N*MAX_N]; double x_[MAX_N]; double x_0[INC_X*MAX_N]; double x_1[INC_X*MAX_N]; int main() { int n; randGeMatrix(MAX_N, MAX_N, A, INCROW_A, INCCOL_A); makeTrlDiagDom(MAX_N, UNITDIAG, A, INCROW_A, INCCOL_A); randGeMatrix(MAX_N, 1, x_, 1, 0); printf("#UNITDIAG=%3d\n", UNITDIAG); printf("# %51s %25s %38s\n", "dtrlsv_ref", "dtrlsv_row", "dtrlsv_col"); printf("#%9s %9s %9s", "n", "INCROW_A", "INCCOL_A"); printf(" %9s", "INC_X"); printf(" %12s %12s", "t", "MFLOPS"); printf(" %12s %12s %12s", "t", "MFLOPS", "err"); printf(" %12s %12s %12s", "t", "MFLOPS", "err"); printf("\n"); for (n=MIN_N; n<=MAX_N; n+=INC_N) { int runs = 1; int ops = n*(n+1)/2 + (n-1)*n/2; double t0, t1, dt, err; printf(" %9d %9d %9d", n, INCROW_A, INCCOL_A); printf(" %9d", INC_X); t0 = 0; runs = 0; do { dcopy(n, x_, 1, x_0, INC_X); dt = walltime(); dtrlsv_ref(n, UNITDIAG, A, INCROW_A, INCCOL_A, x_0, INC_X); dt = walltime() - dt; t0 += dt; ++runs; } while (t0<0.3); t0 /= runs; printf(" %12.2e %12.2lf", t0, ops/(1000*1000*t0)); t1 = 0; runs = 0; do { dcopy(n, x_, 1, x_1, INC_X); dt = walltime(); dtrlsv_row(n, UNITDIAG, A, INCROW_A, INCCOL_A, x_1, INC_X); dt = walltime() - dt; t1 += dt; ++runs; } while (t1<0.3); t1 /= runs; err = err_dtrsv(n, UNITDIAG, 1, A, INCROW_A, INCCOL_A, x_0, INC_X, x_1, INC_X); printf(" %12.2e %12.2lf %12.2e", t1, ops/(1000*1000*t1), err); t1 = 0; runs = 0; do { dcopy(n, x_, 1, x_1, INC_X); dt = walltime(); dtrlsv_col(n, UNITDIAG, A, INCROW_A, INCCOL_A, x_1, INC_X); dt = walltime() - dt; t1 += dt; ++runs; } while (t1<0.3); t1 /= runs; err = err_dtrsv(n, UNITDIAG, 1, A, INCROW_A, INCCOL_A, x_0, INC_X, x_1, INC_X); printf(" %12.2e %12.2lf %12.2e", t1, ops/(1000*1000*t1), err); printf("\n"); } return 0; }