Content |
Solving Systems of Linear Equations
In this example we will solve a system of linear equations \(Ax=b\). For this purpose we first compute the \(LU\) factorization of \(A\) with lapack::trf and then use the triangular solver blas::sm to finish the job. In a later example we show how to call the LAPACK driver function lapack::sv which does internnally what you see here.
Example Code
#include <flens/flens.cxx>
using namespace std;
using namespace flens;
typedef double T;
int
main()
{
typedef GeMatrix<FullStorage<T> > Matrix;
typedef Matrix::IndexType IndexType;
typedef DenseVector<Array<IndexType> > IndexVector;
const Underscore<IndexType> _;
const IndexType m = 4,
n = 5;
Matrix Ab(m, n);
IndexVector piv(m);
Ab = 2, 3, -1, 0, 20,
-6, -5, 0, 2, -33,
2, -5, 6, -6, -43,
4, 6, 2, -3, 49;
cout << "Ab = " << Ab << endl;
lapack::trf(Ab, piv);
const auto A = Ab(_,_(1,m));
auto B = Ab(_,_(m+1,n));
blas::sm(Left, NoTrans, T(1), A.upper(), B);
cout << "X = " << B << endl;
cout << "piv = " << piv << endl;
}
Comments on Example Code
With header flens.cxx all of FLENS gets included.
Define some convenient typedef for the matrix types of our system of linear equations.
We also need an extra vector type for the pivots. The type of the pivots is taken for the system matrix.
typedef DenseVector<Array<IndexType> > IndexVector;
Define an underscore operator for convenient matrix slicing
Set up the baby problem ...
n = 5;
Compute the \(LU\) factorization with lapack::trf
Solve the system of linear equation \(Ax =B\) using blas::sm
auto B = Ab(_,_(m+1,n));
Compile
$shell> cd flens/examples $shell> g++ -std=c++11 -Wall -I../.. -o lapack-getrf lapack-getrf.cc
Run
$shell> cd flens/examples $shell> ./lapack-getrf Ab = 2 3 -1 0 20 -6 -5 0 2 -33 2 -5 6 -6 -43 4 6 2 -3 49 X = 1 9 9 9 piv = 2 3 4 4
Example with Complex Numbers
We slightly modify the above examples for using complex numbers.
Example Code
#include <flens/flens.cxx>
using namespace std;
using namespace flens;
typedef complex<double> T;
int
main()
{
typedef GeMatrix<FullStorage<T> > Matrix;
typedef Matrix::IndexType IndexType;
typedef DenseVector<Array<IndexType> > IndexVector;
const Underscore<IndexType> _;
const IndexType m = 4,
n = 5;
Matrix Ab(m, n);
IndexVector piv(m);
Ab = T(1, 0), T( 1,-1), T( 2,20), T( 1, 0),
T(0, 1), T( 1, 2), T(-10, 8), T(-1, 1),
T(0,-1), T(-1, 1), T( 40, 6), T(-2,-1);
cout << "Ab = " << Ab << endl;
lapack::trf(Ab, piv);
const auto A = Ab(_,_(1,m));
auto B = Ab(_,_(m+1,n));
blas::sm(Left, NoTrans, T(1), A.upper(), B);
cout << "X = " << B << endl;
cout << "piv = " << piv << endl;
}
Comments on Example Code
Set up the baby problem ...
n = 5;
Compute the \(LU\) factorization with lapack::trf
Solve the system of linear equation \(Ax =B\) using blas::sm
auto B = Ab(_,_(m+1,n));
Compile
$shell> cd flens/examples $shell> g++ -std=c++11 -Wall -I../.. -o lapack-complex-getrf lapack-complex-getrf.cc
Run
$shell> cd flens/examples $shell> ./lapack-complex-getrf Ab = (1,0) (1,-1) (2,20) (1,0) (0,1) (1,2) (-10,8) (-1,1) (0,-1) (-1,1) (40,6) (-2,-1) (1,0) (1,0) (1,0) (1,0) (1,0) (1,0) (1,0) (1,0) X = (0.012115,-0.00299148) (0.142732,-0.0622367) (0.0546556,0.0496209) (0.790497,0.0156072) piv = 3 2 3 4
Using QD (Double-Double and Quad-Double Package)
The following description is taken from the QD Library page:
The QD Library supports both a double-double datatype (approx. 32 decimal digits) and a quad-double datatype (approx. 64 decimal digits). The computational library is written in C++. Both C++ and Fortran-90 high-level language interfaces are provided to permit one to use convert an existing C++ or Fortran-90 program to use the library with only minor changes to the source code. In most cases only a few type statements and (for Fortran-90 programs) read/write statements need to be changed. PSLQ and numerical quadrature programs are included.
Modified Example Code
#include <qd/qd_real.h>
#include <qd/fpu.h>
#include <flens/flens.cxx>
using namespace std;
using namespace flens;
typedef qd_real T;
int
main()
{
unsigned int old_cw;
fpu_fix_start(&old_cw);
typedef GeMatrix<FullStorage<T> > Matrix;
typedef Matrix::IndexType IndexType;
typedef DenseVector<Array<IndexType> > IndexVector;
const Underscore<IndexType> _;
const IndexType m = 4,
n = 5;
Matrix Ab(m, n);
IndexVector piv(m);
Ab = 2, 3, -1, 0, 20,
-6, -5, 0, 2, -33,
2, -5, 6, -6, -43,
4, 6, 2, -3, 49;
cout << "Ab = " << Ab << endl;
lapack::trf(Ab, piv);
const auto A = Ab(_,_(1,m));
auto B = Ab(_,_(m+1,n));
blas::sm(Left, NoTrans, T(1), A.upper(), B);
cout << "X = " << B << endl;
fpu_fix_end(&old_cw);
}
Comments on Modifications
Only a few modifications were made which will be pointed out in the following:
Include the qd-headers
#include <qd/fpu.h>
For using quad-double precision in this example change the typedef to
For understanding the next code lines we first take a look at the QD Library documentation:
The algorithms in the QD library assume IEEE double precision floating point arithmetic. Since Intel x86 processors have extended (80-bit) floating point registers, the round-to-double flag must be enabled in the control word of the FPU for this library to function properly under x86 processors. The function fpu_fix_start turns on the round-to-double bit in the FPU control word, while fpu_fix_end will restore the original state.
So the first thing we do in main is turning on the correct rounding ...
main()
{
unsigned int old_cw;
fpu_fix_start(&old_cw);
... and at the end restore FPU rounding behavior as mentioned above.
}
Compile and Run
$shell> cd flens/examples $shell> g++ -Wall -std=c++11 -I../.. -I /usr/local/include -c qd-lapack-getrf.cc $shell> g++ -o qd-lapack-getrf qd-lapack-getrf.o /usr/local/lib/libqd.a $shell> ./qd-lapack-getrf Ab = 2.000000e+00 3.000000e+00 -1.000000e+00 0.000000e+00 2.000000e+01 -6.000000e+00 -5.000000e+00 0.000000e+00 2.000000e+00 -3.300000e+01 2.000000e+00 -5.000000e+00 6.000000e+00 -6.000000e+00 -4.300000e+01 4.000000e+00 6.000000e+00 2.000000e+00 -3.000000e+00 4.900000e+01 X = 1.000000e+00 9.000000e+00 9.000000e+00 9.000000e+00
Using mpfr::real (C++ Interface for mpfr)
The following description is taken from the mpfr::real page:
The class mpfr::real is a high-level C++ wrapper for the GNU MPFR library, a C library for multiple-precision floating-point computations with correct rounding.
The objects of mpfr::real can (almost) be used like doubles or other fundamental floating-point types, thus avoiding the use of C functions to manipulate MPFR's low-level data type directly. In addition to all arithmetic and comparison operators available for fundamental floating-point types, mpfr::real also supports the set of mathematical functions for double from math.h/cmath. Finally, std::istream operator >> and std::ostream operator << are implemented for mpfr::reals. This allows to substitute double with mpfr::real with no further modifications of the code in many cases.
Modified Example Code
#define REAL_ENABLE_CONVERSION_OPERATORS
#include <external/real.hpp>
#include <flens/flens.cxx>
using namespace std;
using namespace flens;
typedef mpfr::real<53> T;
int
main()
{
typedef GeMatrix<FullStorage<T> > Matrix;
typedef DenseVector<Array<T> > Vector;
typedef Matrix::IndexType IndexType;
typedef DenseVector<Array<IndexType> > IndexVector;
const Underscore<IndexType> _;
const IndexType m = 4,
n = 5;
Matrix Ab(m, n);
IndexVector piv(m);
Ab = 2, 3, -1, 0, 20,
-6, -5, 0, 2, -33,
2, -5, 6, -6, -43,
4, 6, 2, -3, 49;
cout << "Ab = " << Ab << endl;
lapack::trf(Ab, piv);
const auto A = Ab(_,_(1,m));
auto B = Ab(_,_(m+1,n));
blas::sm(Left, NoTrans, T(1), A.upper(), B);
cout << "X = " << B << endl;
}
Comments on Modifications
Include header file for mpfr::real before flens.cxx and make sure that conversion operators are enabled
#include <external/real.hpp>
#include <flens/flens.cxx>
Make typedef for using mpfr::real
Compile and Run
$shell> cd flens/examples $shell> g++ -Wall -DWITH_MPFR -std=c++11 -I../.. -o mpfr-real-lapack-getrf mpfr-real-lapack-getrf.cc -lmpfr -lgmp $shell> ./mpfr-real-lapack-getrf Ab = 2.000000e+00 3.000000e+00 -1.000000e+00 0.000000e+00 2.000000e+01 -6.000000e+00 -5.000000e+00 0.000000e+00 2.000000e+00 -3.300000e+01 2.000000e+00 -5.000000e+00 6.000000e+00 -6.000000e+00 -4.300000e+01 4.000000e+00 6.000000e+00 2.000000e+00 -3.000000e+00 4.900000e+01 X = 1.000000e+00 9.000000e+00 9.000000e+00 9.000000e+00
For Performance: Link with ATLAS
For high performance you have to link with an optimized BLAS implementation like GotoBLAS or ATLAS. For ATLAS pass -DWITH_ATLAS to the compiler:
$shell> cd flens/examples $shell> g++ -DWITH_ATLAS -Wall -std=c++11 -I../.. -L/usr/local/atlas/lib -o atlas-lapack-getrf lapack-getrf.cc -latlas -lcblas $shell> ./atlas-lapack-getrf Ab = 2 3 -1 0 20 -6 -5 0 2 -33 2 -5 6 -6 -43 4 6 2 -3 49 X = 1 9 9 9 piv = 2 3 4 4
If you want to check which BLAS function gets called also pass -DDEBUG_CXXBLAS to the compiler and grep and sort for CXXBLAS (note that the problem size in this example is so small that no LEVEL 3 BLAS function gets used):
$shell> cd flens/examples $shell> g++ -DWITH_ATLAS -DCXXBLAS_DEBUG -Wall -std=c++11 -I../.. -o atlas-lapack-getrf lapack-getrf.cc -L/usr/local/atlas/lib -latlas -lcblas $shell> ./atlas-lapack-getrf 2>&1 | grep CXXBLAS | sort -u
For optimal performance also pass -DNDEBUG and -O3.