Content

Accessing Raw Data Pointers and Strides

In some object-oriented circles the idea of providing access to raw data is considered as a bad idea in general. If you belong to these people and still want high performance you have to use libraries coded by people who are dealing with raw pointers. If you want to implement highly efficient numerical code you have to know what you do and deal in some places with raw data.

Assuming you know what you are doing FLENS gives you direct access to the raw data of its matrix/vector types. One application is interfacing with other libraries. For example BLAS, LAPACK, etc.

But also consider cases where an efficient numerical code requires local buffers. In this case you often do not want to allocate them on the heap. That's because allocation data on the heap can cause a considerable runtime overhead in some cases. So if you depend on dynamic memory allocation you are interested that it happens as rarely as possible. Like only once when your application gets started.

In some cases you can do even better. Using memory from the stack or data segment is free, i.e. has no runtime overhead. In some case you can take advantage of this. Common examples are:

Always have an example in mind were you implement a function that gets called by a function that gets called by a function, etc. And each function call happens within a loop.

Accessing Raw Data form Vectors and Matrices

Example Code

#include <cxxstd/iostream.h>
#include <flens/flens.cxx>

using namespace flens;
using namespace std;


typedef GeMatrix<FullStorage<double> >  DGeMatrix;
typedef DenseVector<Array<double> >     DDenseVector;

int
main()
{
    DGeMatrix     A(3,3);

    fillRandom(A);

    cout << "A = " << A << endl;

    cout << "A.strideRow()          = " << A.strideRow() << endl;
    cout << "A.strideCol()          = " << A.strideCol() << endl << endl;

    cout << "A.data()               = " << A.data() << endl;
    cout << "&A(1,1)                = " << &A(1,1) << endl << endl;

    cout << "A.data()+A.strideRow() = " << A.data()+A.strideRow() << endl;
    cout << "&A(2,1)                = " << &A(2,1) << endl << endl;

    cout << "A.data()+A.strideCol() = " << A.data()+A.strideCol() << endl;
    cout << "&A(1,2)                = " << &A(1,2) << endl << endl;
}

Compile and Rn

$shell> cd flens/examples                                                       
$shell> g++ -Wall -std=c++11 -I../.. tut01-page03-example01.cc                  
$shell> ./a.out                                                                 
A = 
    -0.999984    -0.0826997     -0.905911 
    -0.736924     0.0655345      0.357729 
     0.511211     -0.562082      0.358593 
A.strideRow()          = 1
A.strideCol()          = 3
A.data()               = 0x7fe7f3c039b0
&A(1,1)                = 0x7fe7f3c039b0
A.data()+A.strideRow() = 0x7fe7f3c039b8
&A(2,1)                = 0x7fe7f3c039b8
A.data()+A.strideCol() = 0x7fe7f3c039c8
&A(1,2)                = 0x7fe7f3c039c8

Naive creation of matrices/vectors in Functions

Consider a small core routine of a numerical application that is used to operate on small matrix blocks. The function itself does only a few computations but gets called many times. For intermediate results a small buffer is needed. In this case dynamic memory allocation can have a considerable impact on runtime.

So you DO NOT want an implementation of this type:

#include <cxxstd/iostream.h>
#include <flens/flens.cxx>

using namespace flens;
using namespace std;


typedef GeMatrix<FullStorage<double> >  DGeMatrix;
typedef DenseVector<Array<double> >     DDenseVector;


// Each function call causes a malloc for a 4x4 matrix and a free at the end of
// scope.
void
foo()
{
    DGeMatrix A(4,4);
    A =  1,  2,  3,  4,
         5,  6,  7,  8,
         9101112,
        13141516;

    cout << "A = " << A << endl;
}

int
main()
{
    // Always have in mind that the function is an essential inner core routine
    // doing only little computation but gets called many times.
    for (int i=0; i<100; ++i) {
        foo();
    }
}

Creating Matrices/Vector from Raw Data

You can not directly create matrices/vectors from raw data. But you can create storage schemes from raw data and construct from these matrices/vectors.

Recall that for GeMatrix<..> the typedefs GeMatrix<..>::EngineView and GeMatrix<..>::EngineConstView give you the types FullStorageView<..> and ConstFullStorageView with the correct template parameters.

Creating a matrix view that uses externally managed raw data follows this pattern:

typedef GeMatrix<FullStorage<double> >  DGeMatrix;

double buffer_[BUFFERSIZE];

// You are responsible that BUFFERSIZE >= NUMROWS*NUMCOLS*STRIDEROW*STRIDECOL
DGeMatrix::View  A = DGeMatrix::EngineView(NUMROWS, NUMCOLS,
                                           buffer_,
                                           STRIDEROW, STRIDECOL);

const DGeMatrix::ConstView B = DGeMatrix::EngineConstView(NUMROWS, NUMCOLS,
                                                          buffer_,
                                                          STRIDEROW, STRIDECOL);

// Ok, you can create a const-view from a EngineView (if buffer_ is not const)
const DGeMatrix::ConstView B = DGeMatrix::EngineView(NUMROWS, NUMCOLS,
                                                     buffer_,
                                                     STRIDEROW, STRIDECOL);

Analogously you can create a vector view from raw data:

typedef DenseVector<Array<double> >     DDenseVector;

double buffer_[BUFFERSIZE];

// Parameter STRIDE defaults to 1
DDenseVector::View x = DDenseVector::EngineView(LENGTH, buffer_);
DDenseVector::View y = DDenseVector::EngineView(LENGTH, buffer_, STRIDE);

const DDenseVector::ConstView v = DDenseVector::EngineConstView(LENGTH, buffer_);
const DDenseVector::ConstView w = DDenseVector::EngineConstView(LENGTH, buffer_, STRIDE);

Using a Stack Buffer for Matrix/Vector Views

If the buffer size is known at compile time and small enough you want to use a buffer on the stack. Because allocating the buffer is free in this case.

You than can use matrix/vector views as an interface for working on the buffer. In many cases you even want to use the same buffer as both, a matrix and a vector. This is no problem, just use the same buffer for a matrix view and a vector view. Then each view is just used as a different interface for the same data:

#include <cxxstd/iostream.h>
#include <flens/flens.cxx>

using namespace flens;
using namespace std;

typedef GeMatrix<FullStorage<double> >  DGeMatrix;
typedef DenseVector<Array<double> >     DDenseVector;

void
foo()
{
    double buffer_[4*4] = { 1,  2,  3,  4,
                            5,  6,  7,  8,
                            9101112,
                           13141516};

    DGeMatrix::View    A = DGeMatrix::EngineView(44, buffer_, 14);
    DGeMatrix::View    B = DGeMatrix::EngineView(44, buffer_, 41);
    DGeMatrix::View    C = DGeMatrix::EngineView(24, buffer_, 41);
    DDenseVector::View x = DDenseVector::EngineView(4*4, buffer_);
    DDenseVector::View y = DDenseVector::EngineView(2*4, buffer_, 2);

    cout << "Matrix/vector views created from stack buffer:" << endl;
    cout << "  ColMajor Matrix View" << endl;
    cout << "  A = " << A << endl;
    cout << "  RowMajor Matrix View" << endl;
    cout << "  B = " << B << endl;
    cout << "  RowMajor Matrix View referencing 2 rows" << endl;
    cout << "  C = " << C << endl;
    cout << "  Vector View" << endl;
    cout << "  x = " << x << endl;
    cout << "  Vector View referencing every second element" << endl;
    cout << "  y = " << y << endl;
}

int
main()
{
    foo();
}
$shell> cd flens/examples                                                       
$shell> g++ -Wall -std=c++11 -I../.. tut01-page03-example03.cc                  
$shell> ./a.out                                                                 
Matrix/vector views created from stack buffer:
  ColMajor Matrix View
  A = 
            1             5             9            13 
            2             6            10            14 
            3             7            11            15 
            4             8            12            16 
  RowMajor Matrix View
  B = 
            1             2             3             4 
            5             6             7             8 
            9            10            11            12 
           13            14            15            16 
  RowMajor Matrix View referencing 2 rows
  C = 
            1             2             3             4 
            5             6             7             8 
  Vector View
  x = 
            1              2              3              4              5              6              7              8              9             10             11             12             13             14             15             16 
  Vector View referencing every second element
  y = 
            1              3              5              7              9             11             13             15 

If Buffers are to big for the Stack

In many cases the required workspace depends on the dimension of matrix or vector arguments. In this case you obviously can not use a stack buffer. In this case it should be possible that the buffer can be created before the function gets called.

Consider a function for computing the eigenvalues of a matrix. An efficient implementation requires a workspace. For the size of the workspace we have two cases:

So a common practise is to implement two functions:

If you don't care (or if you know that it does not matter) in some cases you can provide a third function:

#include <cassert>
#include <iostream>
#include <utility>
#include <flens/flens.cxx>

using namespace flens;
using namespace std;

template <typename MA>
std::pair<longlong>
eigenvalues_wsq(const GeMatrix<MA> &A)
{
    // Compute and return minimal and optimal workspace sizes
    return std::pair<long,long>(A.numRows()*4, A.numRows()*256);
}

template <typename MA, typename VV, typename VWS>
void
eigenvalues(const GeMatrix<MA> &A, DenseVector<VV> &v, DenseVector<VWS> &ws)
{
    auto wsq = eigenvalues_wsq(A);
    if (ws.length()==0) {
        ws.resize(wsq.second);
    }
    assert(ws.length()>=wsq.first);

    //
    // ... compute eigenvalues v of A ... here the actual work happens!!!
    //
}

template <typename MA, typename VV>
void
eigenvalues(const GeMatrix<MA> &A, DenseVector<VV> &v)
{
    DenseVector<VV> ws;

    eigenvalues(A, v, ws);
}

int
main()
{
    typedef GeMatrix<FullStorage<double> >  DGeMatrix;
    typedef DenseVector<Array<double> >     DDenseVector;

    // Use case 1: Use optimal workspace (performance for memory).  We have
    //             a typical case where the same problem size is involved.  So
    //             the workspace can be reused.
    {
        DGeMatrix     A(100100);
        DDenseVector  v(100);

        auto workspace = eigenvalues_wsq(A);
        DDenseVector  ws(workspace.second);

        for (int i=0; i<666; ++i) {
            fillRandom(A);
            eigenvalues(A, v, ws);
        }
    }

    // Use case 2: Use minimal workspace (memory for performance)
    {
        DGeMatrix     A(100100);
        DDenseVector  v(100);

        auto workspace = eigenvalues_wsq(A);
        DDenseVector  ws(workspace.first);

        for (int i=0; i<666; ++i) {
            fillRandom(A);
            eigenvalues(A, v, ws);
        }
    }

    // Use case 3: Allocate (optimal) workspace on first call.
    {
        DGeMatrix     A(100100);
        DDenseVector  v(100);
        DDenseVector  ws;

        for (int i=0; i<666; ++i) {
            fillRandom(A);
            eigenvalues(A, v, ws);
        }
    }

    // NO-Use case 4: Workspace gets allocated for each call!!!
    {
        DGeMatrix     A(100100);
        DDenseVector  v(100);

        for (int i=0; i<666; ++i) {
            fillRandom(A);
            eigenvalues(A, v);
        }
    }

    // Use case 5: Ok, workspace is needed only once.
    {
        DGeMatrix     A(100100);
        DDenseVector  v(100);

        fillRandom(A);
        eigenvalues(A, v);
    }
}

Using a Global Buffer for Matrix/Vector Views

In Fortran 77 you don't have the possibility to allocate memory dynamically, i.e. at runtime. And some hardcore circles in scientific computing claim it is a waste of CPU time. The following example shows you how to do things right in this circles:

You can realize this using the preprocessor. The macro N defines the problem size and WORKSPACE the size of the workspace. If WORKSPACE equals zero a workspace query is done.

Example Code

#include <cassert>
#include <iostream>
#include <utility>
#include <flens/flens.cxx>

using namespace flens;
using namespace std;

template <typename MA>
std::pair<longlong>
eigenvalues_wsq(const GeMatrix<MA> &A)
{
    // Compute and return minimal and optimal workspace sizes
    return std::pair<long,long>(A.numRows()*4, A.numRows()*256);
}

template <typename MA, typename VV, typename VWS>
void
eigenvalues(const GeMatrix<MA> &A, DenseVector<VV> &v, DenseVector<VWS> &ws)
{
    auto wsq = eigenvalues_wsq(A);
    if (ws.length()==0) {
        ws.resize(wsq.second);
    }
    assert(ws.length()>=wsq.first);

    //
    // ... compute eigenvalues v of A ... here the actual work happens!!!
    //
    cout << "Computing eigenvalue "
         << "(Using a workspace of size " << ws.length() << ")"
         << endl;
}

template <typename MA, typename VV>
void
eigenvalues(const GeMatrix<MA> &A, DenseVector<VV> &v)
{
    DenseVector<VV> ws;

    eigenvalues(A, v, ws);
}


double buffer_[DIM*DIM + DIM + WORKSPACE];

int
main()
{
    typedef GeMatrix<FullStorage<double> >  DGeMatrix;
    typedef DenseVector<Array<double> >     DDenseVector;

    DGeMatrix::View    A  = DGeMatrix::EngineView(DIM, DIM,
                                                  &buffer_[0],
                                                  1, DIM);

    DDenseVector::View v  = DDenseVector::EngineView(DIM,
                                                     &buffer_[DIM*DIM]);

    DDenseVector::View ws = DDenseVector::EngineView(WORKSPACE,
                                                     &buffer_[DIM*DIM+DIM]);

#   if WORKSPACE==0
    // Perform workspace query and return

    auto workspace = eigenvalues_wsq(A);
    cout << workspace.first << " " << workspace.second << endl;

#   else
    // Solve the problem with given workspace

    for (int i=0; i<6; ++i) {
        fillRandom(A);
        eigenvalues(A, v, ws);
    }
#   endif
}

Compile and Run

$shell> cd flens/examples                                                       
$shell> # compile and run for a worksize query                                  
$shell> g++ -Wall -DDIM=100 -DWORKSPACE=0 -std=c++11 -I../.. tut01-page03-example05.cc                                     
$shell> ./a.out                                                                 
400 25600
$shell> # We store the output of a.out  in a bash array                         
$shell> WS=(`./a.out`)                                                          
./a.out
$shell> # Compile with minimal workspace and run                                
$shell> g++ -Wall -DDIM=100 -DWORKSPACE=${WS[0]} -std=c++11 -I../.. tut01-page03-example05.cc                                     
$shell> ./a.out                                                                 
Computing eigenvalue (Using a workspace of size 400)
Computing eigenvalue (Using a workspace of size 400)
Computing eigenvalue (Using a workspace of size 400)
Computing eigenvalue (Using a workspace of size 400)
Computing eigenvalue (Using a workspace of size 400)
Computing eigenvalue (Using a workspace of size 400)
$shell> # Compile with optimal workspace and run                                
$shell> g++ -Wall -DDIM=100 -DWORKSPACE=${WS[1]} -std=c++11 -I../.. tut01-page03-example05.cc                                     
$shell> ./a.out                                                                 
Computing eigenvalue (Using a workspace of size 25600)
Computing eigenvalue (Using a workspace of size 25600)
Computing eigenvalue (Using a workspace of size 25600)
Computing eigenvalue (Using a workspace of size 25600)
Computing eigenvalue (Using a workspace of size 25600)
Computing eigenvalue (Using a workspace of size 25600)