======================================= Accessing Raw Data Pointers and Strides [TOC] ======================================= In some object-oriented circles the idea of providing access to raw data is considered as a bad idea in general. If you belong to these people and still want high performance you have to use libraries coded by people who are dealing with raw pointers. If you want to implement highly efficient numerical code you have to know what you do and deal in some places with raw data. Assuming you know what you are doing FLENS gives you direct access to the raw data of its matrix/vector types. One application is interfacing with other libraries. For example BLAS, LAPACK, etc. But also consider cases where an efficient numerical code requires local buffers. In this case you often do not want to allocate them on the heap. That's because allocation data on the heap can cause a considerable runtime overhead in some cases. So if you depend on dynamic memory allocation you are interested that it happens as rarely as possible. Like only once when your application gets started. In some cases you can do even better. Using memory from the stack or data segment is free, i.e. has no runtime overhead. In some case you can take advantage of this. Common examples are: - If you only need a small buffer inside a function just use a local buffer on the stack. - If you know the total amount of needed workspace at compile time use a global buffer on the data segment. And yes, we assume you know how to deal with it in a multi-threaded environment. *Always have an example in mind were you implement a function that gets called by a function that gets called by a function, etc. And each function call happens within a loop.* Accessing Raw Data form Vectors and Matrices ============================================ - For matrices of type `GeMatrix` the following methods provides the relevant information: +-[CODE]-------------------------------+--------------------------------------------+ | | | | const ElementType * | Pointer to the first element of the matrix.| | data() const; | So this is the address of `A(1,1)` (or more| | | general `A(A.firstRow(),A.firstCol())`). | +--------------------------------------+ | | | | | ElementType * | | | data(); | | | | | | | | +--------------------------------------+--------------------------------------------+ | StorageOrder | Returns `RowMajor`, `ColMajor` or `Grid`. | | order() const; | | | | An storage order `Grid` occurs if you have | | | e.g. a matrix view that references every | | | second row and every third column of a | | | matrix. | +--------------------------------------+--------------------------------------------+ | IndexType | Stride in memory between elements in the | | strideRow() const; | same column. So `A.data() + A.strideRow()`| | | is the address of `A(2,1)`. | | | | | | If your matrix is `ColMajor` this is `1`. | +--------------------------------------+--------------------------------------------+ | IndexType | Stride in memory between elements in the | | strideCol() const; | same row. So `A.data() + A.strideCol()` | | | is the address of `A(2,1)`. | | | | | | If your matrix is `RowMajor` this is `1`. | +--------------------------------------+--------------------------------------------+ | IndexType | - For `ColMajor` matrices returns | | leadingDimension() const; | `strideCol()`. | | | - For `RowMajor` matrices returns | | | `strideRow()`. | | | - For `Grid` matrices this method triggers | | | an runtime assertion. | +--------------------------------------+--------------------------------------------+ - For vectors of type `DenseVector` we provide +-[CODE]-------------------------------+--------------------------------------------+ | | | | const ElementType * | Pointer to the first element of the vector.| | data() const; | So this is the address of `x(1)` (or more | | | general `x(x.firstIndex())`). | +--------------------------------------+ | | | | | ElementType * | | | data(); | | | | | +--------------------------------------+--------------------------------------------+ | IndexType | Stride in memory between elements. So | | stride(); | `x.data()+x.stride()` is the address of | | | `x(2)`. | +--------------------------------------+--------------------------------------------+ Example Code ------------ :import: flens/examples/tut01-page03-example01.cc [stripped, downloadable] Compile and Rn -------------- *--[SHELL]-----------------------------------------------------------------* | | | cd flens/examples | | g++ -Wall -std=c++11 -I../.. tut01-page03-example01.cc | | ./a.out | | | *--------------------------------------------------------------------------* Naive creation of matrices/vectors in Functions =============================================== Consider a small core routine of a numerical application that is used to operate on small matrix blocks. The function itself does only a few computations but gets called many times. For intermediate results a small buffer is needed. In this case dynamic memory allocation can have a considerable impact on runtime. So you *DO NOT* want an implementation of this type: :import: flens/examples/tut01-page03-example02.cc [stripped, downloadable] Creating Matrices/Vector from Raw Data ====================================== You can not directly create matrices/vectors from raw data. But you can create storage schemes from raw data and construct from these matrices/vectors. Recall that for `GeMatrix<..>` the typedefs `GeMatrix<..>::EngineView` and `GeMatrix<..>::EngineConstView` give you the types `FullStorageView<..>` and `ConstFullStorageView` with the correct template parameters. Creating a matrix view that uses externally managed raw data follows this pattern: ---- CODE(type=cc)-------------------------------------------------------------- typedef GeMatrix > DGeMatrix; double buffer_[BUFFERSIZE]; // You are responsible that BUFFERSIZE >= NUMROWS*NUMCOLS*STRIDEROW*STRIDECOL DGeMatrix::View A = DGeMatrix::EngineView(NUMROWS, NUMCOLS, buffer_, STRIDEROW, STRIDECOL); const DGeMatrix::ConstView B = DGeMatrix::EngineConstView(NUMROWS, NUMCOLS, buffer_, STRIDEROW, STRIDECOL); // Ok, you can create a const-view from a EngineView (if buffer_ is not const) const DGeMatrix::ConstView B = DGeMatrix::EngineView(NUMROWS, NUMCOLS, buffer_, STRIDEROW, STRIDECOL); -------------------------------------------------------------------------------- Analogously you can create a vector view from raw data: ---- CODE(type=cc)-------------------------------------------------------------- typedef DenseVector > DDenseVector; double buffer_[BUFFERSIZE]; // Parameter STRIDE defaults to 1 DDenseVector::View x = DDenseVector::EngineView(LENGTH, buffer_); DDenseVector::View y = DDenseVector::EngineView(LENGTH, buffer_, STRIDE); const DDenseVector::ConstView v = DDenseVector::EngineConstView(LENGTH, buffer_); const DDenseVector::ConstView w = DDenseVector::EngineConstView(LENGTH, buffer_, STRIDE); -------------------------------------------------------------------------------- Using a Stack Buffer for Matrix/Vector Views ============================================ If the buffer size is known at compile time and small enough you want to use a buffer on the stack. Because allocating the buffer is free in this case. You than can use matrix/vector views as an interface for working on the buffer. In many cases you even want to use the same buffer as both, a matrix and a vector. This is no problem, just use the same buffer for a matrix view and a vector view. Then each view is just used as a different interface for the same data: :import: flens/examples/tut01-page03-example03.cc [stripped, downloadable] *--[SHELL]-----------------------------------------------------------------* | | | cd flens/examples | | g++ -Wall -std=c++11 -I../.. tut01-page03-example03.cc | | ./a.out | | | *--------------------------------------------------------------------------* If Buffers are to big for the Stack =================================== In many cases the required workspace depends on the dimension of matrix or vector arguments. In this case you obviously can not use a stack buffer. In this case it should be possible that the buffer can be created before the function gets called. Consider a function for computing the eigenvalues of a matrix. An efficient implementation requires a workspace. For the size of the workspace we have two cases: - *Minimal workspace* that is required to do the computation. - *Optimal workspace* for best performance. So you can decide how much memory you can invest for performance. So a common practise is to implement two functions: - `eigenvalues_wsq(A)` does a *workspace query* to determine the minimal and optimal workspace for computing the eigenvalues of `A`. The function just returns a pair of integers. - `eigenvalues(A, v, workspace)` does the *actual computation*. It for example receives `workspace` as dense vector. An assertion should check that the length of `workspace` is at least the required minimum. As you saw above you can also create a matrix view from the underlying memory of the workspace vector if this is more appropriate internally. Often you want to do the same numerical operation on different matrices of the same size. So in this case the workspace can have each time the same size. For convenience you can consider the following: If `workspace` has length zero the function internally calls the workspace query and resizes `workspace` for optimal size. In this case the workspace does get allocated on the heap. But if you call the function for the next problem (of same or smaller size) the workspace gets reused without allocating it again. If you don't care (or if you know that it does not matter) in some cases you can provide a third function: - `eigenvalues(A, v)` which just calls `eigenvalue(A, v, workspace)` for an empty `workspace` vector. So in this case each function call will trigger dynamic memory allocation and a release. :import: flens/examples/tut01-page03-example04.cc [stripped, downloadable] Using a Global Buffer for Matrix/Vector Views ============================================= In Fortran 77 you don't have the possibility to allocate memory dynamically, i.e. at runtime. And some hardcore circles in scientific computing claim it is a waste of CPU time. The following example shows you how to do things right in this circles: - You first make a *dry run for a workspace query*. You merely compute the minimal and optimal workspace required for your problem. So the program just outputs two numbers: - *Minimal workspace* required, - *Optimal workspace*. - If your hardware has enough memory to satisfy the minimal workspace needed you compile again. This time your application uses some workspace on the data segment, i.e. some global workspace buffer. The size of the workspace is some number between the minimal and optimal size. *This new compiled program then does the actual compuation*. You can realize this using the preprocessor. The macro `N` defines the problem size and `WORKSPACE` the size of the workspace. If `WORKSPACE` equals zero a workspace query is done. Example Code ------------ :import: flens/examples/tut01-page03-example05.cc [stripped, downloadable] Compile and Run --------------- *--[SHELL]-----------------------------------------------------------------* | | | cd flens/examples | | # compile and run for a worksize query | | g++ -Wall -DDIM=100 -DWORKSPACE=0 -std=c++11 -I../.. +++| | tut01-page03-example05.cc | | ./a.out | | | | # We store the output of a.out in a bash array | | WS=(`./a.out`) | | | | # Compile with minimal workspace and run | | g++ -Wall -DDIM=100 -DWORKSPACE=${WS[0]} -std=c++11 -I../.. +++| | tut01-page03-example05.cc | | ./a.out | | | | # Compile with optimal workspace and run | | g++ -Wall -DDIM=100 -DWORKSPACE=${WS[1]} -std=c++11 -I../.. +++| | tut01-page03-example05.cc | | ./a.out | | | *--------------------------------------------------------------------------* :navigate: __up__ -> doc:flens/examples/tutorial __back__ -> doc:flens/examples/tut01-page02 __next__ -> doc:flens/examples/tut01-page04