Content

Example Code
Comments on Example Code
Compile
Run
Using QD (Double-Double and Quad-Double Package)
Using mpfr::real (C++ Interface for mpfr)
For Performance: Compile with ATLAS

Solving Systems of Linear Equations

Compare this example with flens/examples/lapack-gesv.

Like in flens/examples/lapack-gesv we solve a system of linear equations \(Ax = b.\) But instead of using the driver function lapack::sv we first compute the \(LU\) factorization of \(A\) with lapack::trf and then use the triangular solver blas::sm to finish the job. So basicaly we do manually what lapack::sv does internally.

Example Code

flens/examples/lapack-getrf.cc

#include <iostream>

#include <flens/flens.cxx>

using namespace std;

using namespace flens;

typedef double   T;

int

main()

{

    typedef GeMatrix<FullStorage<T> >           Matrix;

    typedef DenseVector<Array<T> >              Vector;

    typedef Matrix::IndexType                   IndexType;

    typedef DenseVector<Array<IndexType> >      IndexVector;

    const Underscore<IndexType> _;

    const IndexType m = 4,

                    n = 5;

    Matrix            Ab(m, n);

    IndexVector       piv(m);

    Ab =  2,   3,  -1,   0,  20,

         -6,  -5,   0,   2, -33,

          2,  -5,   6,  -6, -43,

          4,   6,   2,  -3,  49;

    cout << "Ab = " << Ab << endl;

    lapack::trf(Ab, piv);

    const auto A = Ab(_,_(1,m));

    auto       B = Ab(_,_(m+1,n));

    blas::sm(Left, NoTrans, T(1), A.upper(), B);

    cout << "X = " << B << endl;

}

Comments on Example Code

#include <iostream>

With header flens.cxx all of FLENS gets included.

#include <flens/flens.cxx>

using namespace std;

using namespace flens;

typedef double   T;

int

main()

{

Define some convenient typedefs for the matrix/vector types of our system of linear equations.

    typedef GeMatrix<FullStorage<T> >           Matrix;

    typedef DenseVector<Array<T> >              Vector;

We also need an extra vector type for the pivots. The type of the pivots is taken for the system matrix.

    typedef Matrix::IndexType                   IndexType;

    typedef DenseVector<Array<IndexType> >      IndexVector;

Define an underscore operator for convenient matrix slicing

const Underscore<IndexType> _;

Set up the baby problem ...

    const IndexType m = 4,

                    n = 5;

    Matrix            Ab(m, n);

    IndexVector       piv(m);

    Ab =  2,   3,  -1,   0,  20,

         -6,  -5,   0,   2, -33,

          2,  -5,   6,  -6, -43,

          4,   6,   2,  -3,  49;

    cout << "Ab = " << Ab << endl;

Compute the \(LU\) factorization with lapack::trf

lapack::trf(Ab, piv);

Solve the system of linear equation \(Ax =B\) using blas::sm

    const auto A = Ab(_,_(1,m));

    auto       B = Ab(_,_(m+1,n));

    blas::sm(Left, NoTrans, T(1), A.upper(), B);

    cout << "X = " << B << endl;

}

Compile

$shell> cd flens/examples                                                      
$shell> clang++ -std=c++0x -Wall -I../.. -o lapack-getrf lapack-getrf.cc

Run

$shell> cd flens/examples                                                      
$shell> ./lapack-getrf                                                         
Ab = 
                   2                    3                   -1                    0                   20 
                  -6                   -5                    0                    2                  -33 
                   2                   -5                    6                   -6                  -43 
                   4                    6                    2                   -3                   49 
X = 
                   1 
                   9 
                   9 
                   9

Using QD (Double-Double and Quad-Double Package)

The following description is taken from the QD Library page:

The QD Library supports both a double-double datatype (approx. 32 decimal digits) and a quad-double datatype (approx. 64 decimal digits). The computational library is written in C++. Both C++ and Fortran-90 high-level language interfaces are provided to permit one to use convert an existing C++ or Fortran-90 program to use the library with only minor changes to the source code. In most cases only a few type statements and (for Fortran-90 programs) read/write statements need to be changed. PSLQ and numerical quadrature programs are included.

Modified Example Code

flens/examples/qd-lapack-getrf.cc

#include <iostream>

#include <qd/qd_real.h>

#include <qd/fpu.h>

#include <flens/flens.cxx>

using namespace std;

using namespace flens;

typedef qd_real   T;

int

main()

{

    unsigned int old_cw;

    fpu_fix_start(&old_cw);

    typedef GeMatrix<FullStorage<T> >           Matrix;

    typedef DenseVector<Array<T> >              Vector;

    typedef Matrix::IndexType                   IndexType;

    typedef DenseVector<Array<IndexType> >      IndexVector;

    const Underscore<IndexType> _;

    const IndexType m = 4,

                    n = 5;

    Matrix            Ab(m, n);

    IndexVector       piv(m);

    Ab =  2,   3,  -1,   0,  20,

         -6,  -5,   0,   2, -33,

          2,  -5,   6,  -6, -43,

          4,   6,   2,  -3,  49;

    cout << "Ab = " << Ab << endl;

    lapack::trf(Ab, piv);

    const auto A = Ab(_,_(1,m));

    auto       B = Ab(_,_(m+1,n));

    blas::sm(Left, NoTrans, T(1), A.upper(), B);

    cout << "X = " << B << endl;

    fpu_fix_end(&old_cw);

}

Comments on Modifications

Only a few modifications were made which will be pointed out in the following:

Include the qd-headers

#include <qd/qd_real.h>

#include <qd/fpu.h>

For using quad-double precision in this example change the typedef to

typedef qd_real T;

For understanding the next code lines we first take a look at the QD Library documentation:

The algorithms in the QD library assume IEEE double precision floating point arithmetic. Since Intel x86 processors have extended (80-bit) floating point registers, the round-to-double flag must be enabled in the control word of the FPU for this library to function properly under x86 processors. The function fpu_fix_start turns on the round-to-double bit in the FPU control word, while fpu_fix_end will restore the original state.

So the first thing we do in main is turning on the correct rounding ...

int

main()

{

    unsigned int old_cw;

    fpu_fix_start(&old_cw);

... and at the end restore FPU rounding behavior as mentioned above.

    fpu_fix_end(&old_cw);

}

Compile and Run

$shell> cd flens/examples                                                            
$shell> clang++ -Wall -std=c++0x -I../.. -I/opt/local/include/ -c qd-lapack-getrf.cc 
$shell> clang++ -o qd-lapack-getrf qd-lapack-getrf.o /opt/local/lib/libqd.a          
$shell> ./qd-lapack-getrf                                                            
Ab = 
        2.000000e+00         3.000000e+00        -1.000000e+00         0.000000e+00         2.000000e+01 
       -6.000000e+00        -5.000000e+00         0.000000e+00         2.000000e+00        -3.300000e+01 
        2.000000e+00        -5.000000e+00         6.000000e+00        -6.000000e+00        -4.300000e+01 
        4.000000e+00         6.000000e+00         2.000000e+00        -3.000000e+00         4.900000e+01 
X = 
        1.000000e+00 
        9.000000e+00 
        9.000000e+00 
        9.000000e+00

Using mpfr::real (C++ Interface for mpfr)

The following description is taken from the mpfr::real page:

The class mpfr::real is a high-level C++ wrapper for the GNU MPFR library, a C library for multiple-precision floating-point computations with correct rounding.

The objects of mpfr::real can (almost) be used like doubles or other fundamental floating-point types, thus avoiding the use of C functions to manipulate MPFR's low-level data type directly. In addition to all arithmetic and comparison operators available for fundamental floating-point types, mpfr::real also supports the set of mathematical functions for double from math.h/cmath. Finally, std::istream operator >> and std::ostream operator << are implemented for mpfr::reals. This allows to substitute double with mpfr::real with no further modifications of the code in many cases.

Modified Example Code

flens/examples/mpfr-real-lapack-getrf.cc

#include <iostream>

#define REAL_ENABLE_CONVERSION_OPERATORS

#include <external/real.hpp>

#include <flens/flens.cxx>

using namespace std;

using namespace flens;

typedef mpfr::real<53>   T;

int

main()

{

    typedef GeMatrix<FullStorage<T> >           Matrix;

    typedef DenseVector<Array<T> >              Vector;

    typedef Matrix::IndexType                   IndexType;

    typedef DenseVector<Array<IndexType> >      IndexVector;

    const Underscore<IndexType> _;

    const IndexType m = 4,

                    n = 5;

    Matrix            Ab(m, n);

    IndexVector       piv(m);

    Ab =  2,   3,  -1,   0,  20,

         -6,  -5,   0,   2, -33,

          2,  -5,   6,  -6, -43,

          4,   6,   2,  -3,  49;

    cout << "Ab = " << Ab << endl;

    lapack::trf(Ab, piv);

    const auto A = Ab(_,_(1,m));

    auto       B = Ab(_,_(m+1,n));

    blas::sm(Left, NoTrans, T(1), A.upper(), B);

    cout << "X = " << B << endl;

}

Comments on Modifications

Include header file for mpfr::real before flens.cxx and make sure that conversion operators are enabled

#define REAL_ENABLE_CONVERSION_OPERATORS

#include <external/real.hpp>

#include <flens/flens.cxx>

Make typedef for using mpfr::real

typedef mpfr::real<53> T;

Compile and Run

$shell> cd flens/examples                                                      
$shell> clang++ -Wall -std=c++0x -I../.. -I/opt/local/include -o mpfr-real-lapack-getrf mpfr-real-lapack-getrf.cc -L /opt/local/lib -lmpfr -lgmp                                    
$shell> ./mpfr-real-lapack-getrf                                               
Ab = 
        2.000000e+00         3.000000e+00        -1.000000e+00         0.000000e+00         2.000000e+01 
       -6.000000e+00        -5.000000e+00         0.000000e+00         2.000000e+00        -3.300000e+01 
        2.000000e+00        -5.000000e+00         6.000000e+00        -6.000000e+00        -4.300000e+01 
        4.000000e+00         6.000000e+00         2.000000e+00        -3.000000e+00         4.900000e+01 
X = 
        1.000000e+00 
        9.000000e+00 
        9.000000e+00 
        9.000000e+00

For Performance: Compile with ATLAS

For high performance you have to link with an optimized BLAS implementation like GotoBLAS or ATLAS. For ATLAS pass -DWITH_ATLAS to the compiler:

$shell> cd flens/examples                                                      
$shell> clang++ -DWITH_ATLAS -Wall -std=c++0x -I../.. -I/opt/local/include -o atlas-lapack-getrf lapack-getrf.cc -latlas -lcblas                                                
$shell> ./atlas-lapack-getrf                                                   
Ab = 
                   2                    3                   -1                    0                   20 
                  -6                   -5                    0                    2                  -33 
                   2                   -5                    6                   -6                  -43 
                   4                    6                    2                   -3                   49 
X = 
                   1 
                   9 
                   9 
                   9

If you want to check which BLAS function gets called also pass -DDEBUG_CXXBLAS to the compiler and grep and sort for CXXBLAS (note that the problem size in this example is so small that no LEVEL 3 BLAS function gets used):

$shell> cd flens/examples                                                      
$shell> clang++ -DWITH_ATLAS -DCXXBLAS_DEBUG -Wall -std=c++0x -I../.. -I/opt/local/include -o atlas-lapack-getrf lapack-getrf.cc -latlas -lcblas                                                
$shell> ./atlas-lapack-getrf 2>&1 | grep CXXBLAS | sort -u                     
CXXBLAS: [ATLAS] cblas_dger
CXXBLAS: [ATLAS] cblas_dscal
CXXBLAS: [ATLAS] cblas_dswap
CXXBLAS: [ATLAS] cblas_dtrsm
CXXBLAS: [ATLAS] cblas_idamax

For optimal performance also pass -DNDEBUG and -O3.