============================================== Data types for matrices and vectors on the GPU [TOC] ============================================== Following the RAII principle, it appears easier to manage the allocation and the release of GPU memory by an object. In principle it is possible to use the same class by both sides. If you want to do this you need to specify for each method including the constructors and destructors from which side they may be invoked. This is supported by CUDA. By default, all methods can just be invoked from the CPU. Methods with a `_``_device_``_` specifier in front of the declaration can be called on the GPU side. If a method is to be used from both sides, the keywords `_``_device_``_` and `_``_host_``_` have to be combined. Following approach appears straightforward: * Constructors and destructors of classes that require dynamic storage remain CPU methods as this can be done by the CPU side only: Example: ---- CODE (type=cpp) ------------------------------------------------------- DenseVector(std::size_t length) : length(length), inc(1) { CHECK_CUDA(cudaMalloc, (void**)&data, length * sizeof(T)); } ~DenseVector() { CHECK_CUDA(cudaFree, data); } ---------------------------------------------------------------------------- * View objects that do not need dynamic memory can be constructed on both sides. Example: ---- CODE (type=cpp) ------------------------------------------------------- __device__ __host__ DenseVectorView(std::size_t length, T *data, std::ptrdiff_t inc) : length(length), inc(inc), data(data) { } ---------------------------------------------------------------------------- * The access methods for dynamic GPU storage are restricted to the GPU side. Example: ---- CODE (type=cpp) ------------------------------------------------------- __device__ const T& operator()(std::size_t i) const { return data[i]; } ---------------------------------------------------------------------------- To support the transfer between GPU and CPU objects it is advisable to offer corresponding `copy` operations. However, as we have no efficient means of reorganizing content during a transfer, both sides shall share the same organization on a contigious stretch of memory. Then this can be done with one invocation of `cudaMemcpy`: ---- CODE (type=cpp) ---------------------------------------------------------- template void copy(const hpc::matvec::DenseVector& x, DenseVector& y) { assert(x.length() == y.length()); CHECK_CUDA(cudaMemcpy, y.data(), x.data(), x.length() * sizeof(T), cudaMemcpyHostToDevice); } template void copy(const DenseVector& x, hpc::matvec::DenseVector& y) { assert(x.length() == y.length()); CHECK_CUDA(cudaMemcpy, y.data(), x.data(), x.length() * sizeof(T), cudaMemcpyDeviceToHost); } ------------------------------------------------------------------------------- In consequence, we do not support copy operations for views. Exercise ======== Adapt your solution for the scalar product such that the vector class out of `` and the copy operations from `` are used. Your solution shall no longer use `cudaMalloc`, `cudaMemcpy`, or `cudaFree` directly. The kernel function can be left untouched. However, you need to reconsider how pointers are to be passed. Until now we did it using the access function as, for example, in `&x(0)`. Why does this no longer work? :navigate: up -> doc:index back -> doc:session08/page06 next -> doc:session08/page08