================================ RAII storage objects for the GPU [TOC] ================================ When we work with vectors and matrices on the GPU device we have the unusual situation that these objects are allocated and released on the host but can be accessed only on the device. Hence, the associated RAII classes for maintaining device storage are divided into host and device parts. The following shows an excerpt of `` which provides `DeviceBuffer`, a host-based RAII class for maintaining arrays on the device: ---- CODE (type=cpp) ---------------------------------------------------------- template struct DeviceBuffer { void* const devptr; T* const aligned_devptr; DeviceBuffer(std::size_t length, std::size_t alignment = alignof(T)) : devptr(cuda_malloc(compute_aligned_size(length, alignment))), aligned_devptr(align_ptr(devptr, alignment)) { } ~DeviceBuffer() { CHECK_CUDA(cudaFree, devptr); } T* data() const { return aligned_devptr; } DeviceBuffer(DeviceBuffer&&) = default; DeviceBuffer(const DeviceBuffer&) = delete; DeviceBuffer& operator=(const DeviceBuffer&) = delete; DeviceBuffer& operator=(DeviceBuffer&&) = delete; }; ------------------------------------------------------------------------------- Based on this class, the lecture library provides the class `DeviceGeMatrix` in `` which is based on `DeviceBuffer`. We have also corresponding _copy_ functions in `` which, however, are based on `cudaMemcpy` and must therefore insist that the to be copied storage is a contigious block of storage with identical layouts. This could lead to following simple steps to create a matrix on the host, initialize it, copy it to the device, work on the device on it within a kernel function, and copy it back to examine the result: ---- CODE (type=cpp) ---------------------------------------------------------- GeMatrix A(M, N, Order::RowMajor); // fill A... DeviceGeMatrix devA(A.numRows(), A.numCols(), Order::RowMajor); copy(A, devA); // copy A to devA // work on devA using a kernel function... copy(devA, A); // copy devA to A ------------------------------------------------------------------------------- But how do we pass `devA` to the kernel function and how is the kernel function to be declared? Before, we had on the host constructs like the following where we passed the matrices (or vectors) by reference: ---- CODE (type=cpp) ---------------------------------------------------------- template class Matrix, typename T, Require< Ge> > = true> void f(Matrix& A) { // ... } ------------------------------------------------------------------------------- Think about how this can be done when we pass vector or matrix parameters from host to device: * By reference as before on the host? * By value? * By ...? Write it down on some piece of paper with the program text where the kernel function call is done and where the the kernel function is declared (which can be templated as well). Within the `Require` clause, device matrices can be recognized by `DeviceGe` instead of `Ge` (using ``). If you are not sure about this can be done using our library, just write down how you would like to have it working, provided it is a feasible solution. :navigate: up -> doc:index next -> doc:session09/page02