Content

Solving Systems of Linear Equations

Compare this example with flens/examples/lapack-gesv.

Like in flens/examples/lapack-gesv we solve a system of linear equations \(Ax = b.\) But instead of using the driver function lapack::sv we first compute the \(LU\) factorization of \(A\) with lapack::trf and then use the triangular solver blas::sm to finish the job. So basicaly we do manually what lapack::sv does internally.

Example Code

#include <iostream>
#include <flens/flens.cxx>

using namespace std;
using namespace flens;

typedef double   T;

int
main()
{
    typedef GeMatrix<FullStorage<T> >           Matrix;
    typedef DenseVector<Array<T> >              Vector;

    typedef Matrix::IndexType                   IndexType;
    typedef DenseVector<Array<IndexType> >      IndexVector;

    const Underscore<IndexType> _;

    const IndexType m = 4,
                    n = 5;

    Matrix            Ab(m, n);
    IndexVector       piv(m);

    Ab =  2,   3,  -1,   0,  20,
         -6,  -5,   0,   2, -33,
          2,  -5,   6,  -6, -43,
          4,   6,   2,  -3,  49;

    cout << "Ab = " << Ab << endl;

    lapack::trf(Ab, piv);

    const auto A = Ab(_,_(1,m));
    auto       B = Ab(_,_(m+1,n));

    blas::sm(Left, NoTrans, T(1), A.upper(), B);

    cout << "X = " << B << endl;
}

Comments on Example Code

#include <iostream>

With header flens.cxx all of FLENS gets included.

#include <flens/flens.cxx>

using namespace std;
using namespace flens;

typedef double   T;

int
main()
{

Define some convenient typedefs for the matrix/vector types of our system of linear equations.

    typedef GeMatrix<FullStorage<T> >           Matrix;
    typedef DenseVector<Array<T> >              Vector;

We also need an extra vector type for the pivots. The type of the pivots is taken for the system matrix.

    typedef Matrix::IndexType                   IndexType;
    typedef DenseVector<Array<IndexType> >      IndexVector;

Define an underscore operator for convenient matrix slicing

    const Underscore<IndexType> _;

Set up the baby problem ...

    const IndexType m = 4,
                    n = 5;

    Matrix            Ab(m, n);
    IndexVector       piv(m);

    Ab =  2,   3,  -1,   0,  20,
         -6,  -5,   0,   2, -33,
          2,  -5,   6,  -6, -43,
          4,   6,   2,  -3,  49;

    cout << "Ab = " << Ab << endl;

Compute the \(LU\) factorization with lapack::trf

    lapack::trf(Ab, piv);

Solve the system of linear equation \(Ax =B\) using blas::sm

    const auto A = Ab(_,_(1,m));
    auto       B = Ab(_,_(m+1,n));

    blas::sm(Left, NoTrans, T(1), A.upper(), B);

    cout << "X = " << B << endl;
}

Compile

$shell> cd flens/examples                                                      
$shell> clang++ -std=c++0x -Wall -I../.. -o lapack-getrf lapack-getrf.cc       

Run

$shell> cd flens/examples                                                      
$shell> ./lapack-getrf                                                         
Ab = 
                   2                    3                   -1                    0                   20 
                  -6                   -5                    0                    2                  -33 
                   2                   -5                    6                   -6                  -43 
                   4                    6                    2                   -3                   49 
X = 
                   1 
                   9 
                   9 
                   9 

Using QD (Double-Double and Quad-Double Package)

The following description is taken from the QD Library page:

The QD Library supports both a double-double datatype (approx. 32 decimal digits) and a quad-double datatype (approx. 64 decimal digits). The computational library is written in C++. Both C++ and Fortran-90 high-level language interfaces are provided to permit one to use convert an existing C++ or Fortran-90 program to use the library with only minor changes to the source code. In most cases only a few type statements and (for Fortran-90 programs) read/write statements need to be changed. PSLQ and numerical quadrature programs are included.

Modified Example Code

#include <iostream>

#include <qd/qd_real.h>
#include <qd/fpu.h>

#include <flens/flens.cxx>

using namespace std;
using namespace flens;

typedef qd_real   T;

int
main()
{
    unsigned int old_cw;
    fpu_fix_start(&old_cw);

    typedef GeMatrix<FullStorage<T> >           Matrix;
    typedef DenseVector<Array<T> >              Vector;
    typedef Matrix::IndexType                   IndexType;
    typedef DenseVector<Array<IndexType> >      IndexVector;

    const Underscore<IndexType> _;

    const IndexType m = 4,
                    n = 5;

    Matrix            Ab(m, n);
    IndexVector       piv(m);

    Ab =  2,   3,  -1,   0,  20,
         -6,  -5,   0,   2, -33,
          2,  -5,   6,  -6, -43,
          4,   6,   2,  -3,  49;

    cout << "Ab = " << Ab << endl;

    lapack::trf(Ab, piv);

    const auto A = Ab(_,_(1,m));
    auto       B = Ab(_,_(m+1,n));

    blas::sm(Left, NoTrans, T(1), A.upper(), B);

    cout << "X = " << B << endl;

    fpu_fix_end(&old_cw);
}

Comments on Modifications

Only a few modifications were made which will be pointed out in the following:

Include the qd-headers

#include <qd/qd_real.h>
#include <qd/fpu.h>

For using quad-double precision in this example change the typedef to

typedef qd_real   T;

For understanding the next code lines we first take a look at the QD Library documentation:

The algorithms in the QD library assume IEEE double precision floating point arithmetic. Since Intel x86 processors have extended (80-bit) floating point registers, the round-to-double flag must be enabled in the control word of the FPU for this library to function properly under x86 processors. The function fpu_fix_start turns on the round-to-double bit in the FPU control word, while fpu_fix_end will restore the original state.

So the first thing we do in main is turning on the correct rounding ...

int
main()
{
    unsigned int old_cw;
    fpu_fix_start(&old_cw);

... and at the end restore FPU rounding behavior as mentioned above.

    fpu_fix_end(&old_cw);
}

Compile and Run

$shell> cd flens/examples                                                            
$shell> clang++ -Wall -std=c++0x -I../.. -I/opt/local/include/ -c qd-lapack-getrf.cc 
$shell> clang++ -o qd-lapack-getrf qd-lapack-getrf.o /opt/local/lib/libqd.a          
$shell> ./qd-lapack-getrf                                                            
Ab = 
        2.000000e+00         3.000000e+00        -1.000000e+00         0.000000e+00         2.000000e+01 
       -6.000000e+00        -5.000000e+00         0.000000e+00         2.000000e+00        -3.300000e+01 
        2.000000e+00        -5.000000e+00         6.000000e+00        -6.000000e+00        -4.300000e+01 
        4.000000e+00         6.000000e+00         2.000000e+00        -3.000000e+00         4.900000e+01 
X = 
        1.000000e+00 
        9.000000e+00 
        9.000000e+00 
        9.000000e+00 

Using mpfr::real (C++ Interface for mpfr)

The following description is taken from the mpfr::real page:

The class mpfr::real is a high-level C++ wrapper for the GNU MPFR library, a C library for multiple-precision floating-point computations with correct rounding.

The objects of mpfr::real can (almost) be used like doubles or other fundamental floating-point types, thus avoiding the use of C functions to manipulate MPFR's low-level data type directly. In addition to all arithmetic and comparison operators available for fundamental floating-point types, mpfr::real also supports the set of mathematical functions for double from math.h/cmath. Finally, std::istream operator >> and std::ostream operator << are implemented for mpfr::reals. This allows to substitute double with mpfr::real with no further modifications of the code in many cases.

Modified Example Code

#include <iostream>

#define REAL_ENABLE_CONVERSION_OPERATORS
#include <external/real.hpp>
#include <flens/flens.cxx>

using namespace std;
using namespace flens;

typedef mpfr::real<53>   T;

int
main()
{
    typedef GeMatrix<FullStorage<T> >           Matrix;
    typedef DenseVector<Array<T> >              Vector;
    typedef Matrix::IndexType                   IndexType;
    typedef DenseVector<Array<IndexType> >      IndexVector;

    const Underscore<IndexType> _;

    const IndexType m = 4,
                    n = 5;

    Matrix            Ab(m, n);
    IndexVector       piv(m);

    Ab =  2,   3,  -1,   0,  20,
         -6,  -5,   0,   2, -33,
          2,  -5,   6,  -6, -43,
          4,   6,   2,  -3,  49;

    cout << "Ab = " << Ab << endl;

    lapack::trf(Ab, piv);

    const auto A = Ab(_,_(1,m));
    auto       B = Ab(_,_(m+1,n));

    blas::sm(Left, NoTrans, T(1), A.upper(), B);

    cout << "X = " << B << endl;
}

Comments on Modifications

Include header file for mpfr::real before flens.cxx and make sure that conversion operators are enabled

#define REAL_ENABLE_CONVERSION_OPERATORS
#include <external/real.hpp>
#include <flens/flens.cxx>

Make typedef for using mpfr::real

typedef mpfr::real<53>   T;

Compile and Run

$shell> cd flens/examples                                                      
$shell> clang++ -Wall -std=c++0x -I../.. -I/opt/local/include -o mpfr-real-lapack-getrf mpfr-real-lapack-getrf.cc -L /opt/local/lib -lmpfr -lgmp                                    
$shell> ./mpfr-real-lapack-getrf                                               
Ab = 
        2.000000e+00         3.000000e+00        -1.000000e+00         0.000000e+00         2.000000e+01 
       -6.000000e+00        -5.000000e+00         0.000000e+00         2.000000e+00        -3.300000e+01 
        2.000000e+00        -5.000000e+00         6.000000e+00        -6.000000e+00        -4.300000e+01 
        4.000000e+00         6.000000e+00         2.000000e+00        -3.000000e+00         4.900000e+01 
X = 
        1.000000e+00 
        9.000000e+00 
        9.000000e+00 
        9.000000e+00 

For Performance: Compile with ATLAS

For high performance you have to link with an optimized BLAS implementation like GotoBLAS or ATLAS. For ATLAS pass -DWITH_ATLAS to the compiler:

$shell> cd flens/examples                                                      
$shell> clang++ -DWITH_ATLAS -Wall -std=c++0x -I../.. -I/opt/local/include -o atlas-lapack-getrf lapack-getrf.cc -latlas -lcblas                                                
$shell> ./atlas-lapack-getrf                                                   
Ab = 
                   2                    3                   -1                    0                   20 
                  -6                   -5                    0                    2                  -33 
                   2                   -5                    6                   -6                  -43 
                   4                    6                    2                   -3                   49 
X = 
                   1 
                   9 
                   9 
                   9 

If you want to check which BLAS function gets called also pass -DDEBUG_CXXBLAS to the compiler and grep and sort for CXXBLAS (note that the problem size in this example is so small that no LEVEL 3 BLAS function gets used):

$shell> cd flens/examples                                                      
$shell> clang++ -DWITH_ATLAS -DCXXBLAS_DEBUG -Wall -std=c++0x -I../.. -I/opt/local/include -o atlas-lapack-getrf lapack-getrf.cc -latlas -lcblas                                                
$shell> ./atlas-lapack-getrf 2>&1 | grep CXXBLAS | sort -u                     
CXXBLAS: [ATLAS] cblas_dger
CXXBLAS: [ATLAS] cblas_dscal
CXXBLAS: [ATLAS] cblas_dswap
CXXBLAS: [ATLAS] cblas_dtrsm
CXXBLAS: [ATLAS] cblas_idamax

For optimal performance also pass -DNDEBUG and -O3.