Invoking multiple kernel functions from the GPU


Multiple kernel function can be invoked subsequently and work with results left in GPU memory by previous invocations. Hence, it is not necessary to copy data from and to the GPU between two calls.

Invocations of kernel functions are serialized within a so-called stream. If a kernel is configured, you are free to specify a stream. If none is given, a default stream is taken which causes all kernel function invocations to be processed sequentially.

Note that the invocation of a kernel function returns immediately long before the kernel finishes. Copying functions like cudaMemcpy are likewise associated with a stream. Hence, it is the invocation of cudaMemcpy that by default blocks until the previous invocation of a kernel function is completed. We will later see how this can be parallelized by working with multiple streams.