Content |
Solving Systems of Linear Equations
In this example we will solve a system of linear equations \(Ax=b\). For this purpose we first compute the \(LU\) factorization of \(A\) with lapack::trf and then use the triangular solver blas::sm to finish the job. In a later example we show how to call the LAPACK driver function lapack::sv which does internnally what you see here.
Example Code
#include <flens/flens.cxx>
using namespace std;
using namespace flens;
typedef double T;
int
main()
{
typedef GeMatrix<FullStorage<T> > Matrix;
typedef DenseVector<Array<T> > Vector;
typedef Matrix::IndexType IndexType;
typedef DenseVector<Array<IndexType> > IndexVector;
const Underscore<IndexType> _;
const IndexType m = 4,
n = 5;
Matrix Ab(m, n);
IndexVector piv(m);
Ab = 2, 3, -1, 0, 20,
-6, -5, 0, 2, -33,
2, -5, 6, -6, -43,
4, 6, 2, -3, 49;
cout << "Ab = " << Ab << endl;
lapack::trf(Ab, piv);
const auto A = Ab(_,_(1,m));
auto B = Ab(_,_(m+1,n));
blas::sm(Left, NoTrans, T(1), A.upper(), B);
cout << "X = " << B << endl;
}
Comments on Example Code
With header flens.cxx all of FLENS gets included.
using namespace std;
using namespace flens;
typedef double T;
int
main()
{
Define some convenient typedefs for the matrix/vector types of our system of linear equations.
typedef DenseVector<Array<T> > Vector;
We also need an extra vector type for the pivots. The type of the pivots is taken for the system matrix.
typedef DenseVector<Array<IndexType> > IndexVector;
Define an underscore operator for convenient matrix slicing
Set up the baby problem ...
n = 5;
Matrix Ab(m, n);
IndexVector piv(m);
Ab = 2, 3, -1, 0, 20,
-6, -5, 0, 2, -33,
2, -5, 6, -6, -43,
4, 6, 2, -3, 49;
cout << "Ab = " << Ab << endl;
Compute the \(LU\) factorization with lapack::trf
Solve the system of linear equation \(Ax =B\) using blas::sm
auto B = Ab(_,_(m+1,n));
blas::sm(Left, NoTrans, T(1), A.upper(), B);
cout << "X = " << B << endl;
}
Compile
$shell> cd flens/examples $shell> g++ -std=c++11 -Wall -I../.. -o lapack-getrf lapack-getrf.cc
Run
$shell> cd flens/examples $shell> ./lapack-getrf Ab = 2 3 -1 0 20 -6 -5 0 2 -33 2 -5 6 -6 -43 4 6 2 -3 49 X = 1 9 9 9
Using QD (Double-Double and Quad-Double Package)
The following description is taken from the QD Library page:
The QD Library supports both a double-double datatype (approx. 32 decimal digits) and a quad-double datatype (approx. 64 decimal digits). The computational library is written in C++. Both C++ and Fortran-90 high-level language interfaces are provided to permit one to use convert an existing C++ or Fortran-90 program to use the library with only minor changes to the source code. In most cases only a few type statements and (for Fortran-90 programs) read/write statements need to be changed. PSLQ and numerical quadrature programs are included.
Modified Example Code
#include <qd/qd_real.h>
#include <qd/fpu.h>
#include <flens/flens.cxx>
using namespace std;
using namespace flens;
typedef qd_real T;
int
main()
{
unsigned int old_cw;
fpu_fix_start(&old_cw);
typedef GeMatrix<FullStorage<T> > Matrix;
typedef DenseVector<Array<T> > Vector;
typedef Matrix::IndexType IndexType;
typedef DenseVector<Array<IndexType> > IndexVector;
const Underscore<IndexType> _;
const IndexType m = 4,
n = 5;
Matrix Ab(m, n);
IndexVector piv(m);
Ab = 2, 3, -1, 0, 20,
-6, -5, 0, 2, -33,
2, -5, 6, -6, -43,
4, 6, 2, -3, 49;
cout << "Ab = " << Ab << endl;
lapack::trf(Ab, piv);
const auto A = Ab(_,_(1,m));
auto B = Ab(_,_(m+1,n));
blas::sm(Left, NoTrans, T(1), A.upper(), B);
cout << "X = " << B << endl;
fpu_fix_end(&old_cw);
}
Comments on Modifications
Only a few modifications were made which will be pointed out in the following:
Include the qd-headers
#include <qd/fpu.h>
For using quad-double precision in this example change the typedef to
For understanding the next code lines we first take a look at the QD Library documentation:
The algorithms in the QD library assume IEEE double precision floating point arithmetic. Since Intel x86 processors have extended (80-bit) floating point registers, the round-to-double flag must be enabled in the control word of the FPU for this library to function properly under x86 processors. The function fpu_fix_start turns on the round-to-double bit in the FPU control word, while fpu_fix_end will restore the original state.
So the first thing we do in main is turning on the correct rounding ...
main()
{
unsigned int old_cw;
fpu_fix_start(&old_cw);
... and at the end restore FPU rounding behavior as mentioned above.
}
Compile and Run
$shell> cd flens/examples $shell> g++ -Wall -std=c++11 -I../.. -I/opt/local/include/ -c qd-lapack-getrf.cc $shell> g++ -o qd-lapack-getrf qd-lapack-getrf.o /opt/local/lib/libqd.a $shell> ./qd-lapack-getrf Ab = 2.000000e+00 3.000000e+00 -1.000000e+00 0.000000e+00 2.000000e+01 -6.000000e+00 -5.000000e+00 0.000000e+00 2.000000e+00 -3.300000e+01 2.000000e+00 -5.000000e+00 6.000000e+00 -6.000000e+00 -4.300000e+01 4.000000e+00 6.000000e+00 2.000000e+00 -3.000000e+00 4.900000e+01 X = 1.000000e+00 9.000000e+00 9.000000e+00 9.000000e+00
Using mpfr::real (C++ Interface for mpfr)
The following description is taken from the mpfr::real page:
The class mpfr::real is a high-level C++ wrapper for the GNU MPFR library, a C library for multiple-precision floating-point computations with correct rounding.
The objects of mpfr::real can (almost) be used like doubles or other fundamental floating-point types, thus avoiding the use of C functions to manipulate MPFR's low-level data type directly. In addition to all arithmetic and comparison operators available for fundamental floating-point types, mpfr::real also supports the set of mathematical functions for double from math.h/cmath. Finally, std::istream operator >> and std::ostream operator << are implemented for mpfr::reals. This allows to substitute double with mpfr::real with no further modifications of the code in many cases.
Modified Example Code
#define REAL_ENABLE_CONVERSION_OPERATORS
#include <external/real.hpp>
#include <flens/flens.cxx>
using namespace std;
using namespace flens;
typedef mpfr::real<53> T;
int
main()
{
typedef GeMatrix<FullStorage<T> > Matrix;
typedef DenseVector<Array<T> > Vector;
typedef Matrix::IndexType IndexType;
typedef DenseVector<Array<IndexType> > IndexVector;
const Underscore<IndexType> _;
const IndexType m = 4,
n = 5;
Matrix Ab(m, n);
IndexVector piv(m);
Ab = 2, 3, -1, 0, 20,
-6, -5, 0, 2, -33,
2, -5, 6, -6, -43,
4, 6, 2, -3, 49;
cout << "Ab = " << Ab << endl;
lapack::trf(Ab, piv);
const auto A = Ab(_,_(1,m));
auto B = Ab(_,_(m+1,n));
blas::sm(Left, NoTrans, T(1), A.upper(), B);
cout << "X = " << B << endl;
}
Comments on Modifications
Include header file for mpfr::real before flens.cxx and make sure that conversion operators are enabled
#include <external/real.hpp>
#include <flens/flens.cxx>
Make typedef for using mpfr::real
Compile and Run
$shell> cd flens/examples $shell> g++ -Wall -std=c++11 -I../.. -I/opt/local/include -o mpfr-real-lapack-getrf mpfr-real-lapack-getrf.cc -L /opt/local/lib -lmpfr -lgmp $shell> ./mpfr-real-lapack-getrf Ab = 2.000000e+00 3.000000e+00 -1.000000e+00 0.000000e+00 2.000000e+01 -6.000000e+00 -5.000000e+00 0.000000e+00 2.000000e+00 -3.300000e+01 2.000000e+00 -5.000000e+00 6.000000e+00 -6.000000e+00 -4.300000e+01 4.000000e+00 6.000000e+00 2.000000e+00 -3.000000e+00 4.900000e+01 X = 1.000000e+00 9.000000e+00 9.000000e+00 9.000000e+00
For Performance: Link with ATLAS
For high performance you have to link with an optimized BLAS implementation like GotoBLAS or ATLAS. For ATLAS pass -DWITH_ATLAS to the compiler:
$shell> cd flens/examples $shell> g++ -DWITH_ATLAS -Wall -std=c++11 -I../.. -I/opt/local/include -o atlas-lapack-getrf lapack-getrf.cc -latlas -lcblas $shell> ./atlas-lapack-getrf Ab = 2 3 -1 0 20 -6 -5 0 2 -33 2 -5 6 -6 -43 4 6 2 -3 49 X = 1 9 9 9
If you want to check which BLAS function gets called also pass -DDEBUG_CXXBLAS to the compiler and grep and sort for CXXBLAS (note that the problem size in this example is so small that no LEVEL 3 BLAS function gets used):
$shell> cd flens/examples $shell> g++ -DWITH_ATLAS -DCXXBLAS_DEBUG -Wall -std=c++11 -I../.. -I/opt/local/include -o atlas-lapack-getrf lapack-getrf.cc -latlas -lcblas $shell> ./atlas-lapack-getrf 2>&1 | grep CXXBLAS | sort -u CXXBLAS: [ATLAS] cblas_dger CXXBLAS: [ATLAS] cblas_dscal CXXBLAS: [ATLAS] cblas_dswap CXXBLAS: [ATLAS] cblas_dtrsm CXXBLAS: [ATLAS] cblas_idamax
For optimal performance also pass -DNDEBUG and -O3.