====================== Global synchronization [TOC] ====================== In the moment we move from a single block to multiple blocks we are no longer able to synchronize globally among all threads within the GPU. However, it is possible to synchronize at the host as by default a sequence of kernel function calls is serialized. Hence, we can move the loop for the Jacobi solver steps to the host for a global synchronization. We are no longer able to do this on just one matrix $A$ as this trick was based on per-block synchronization. Instead we work now with matrices $A$ and $B$. In the first step we move from $A$ to $B$, then from $B$ to $A$ etc. We must take care not just to initialize $A$ but also at least the border of $B$ (the interior of $B$ gets initialized during the first Jacobi step). Exercise ======== * Develop a kernel function `init_matrix_border` that works like `init_matrix` but initializes the border only. Think about how this kernel function is to be configured. As we have just to initialize its border we do need much less threads than for initializing the entire matrix. Test this separately by copying $B$ back and generating a graphic for it. * Develop a kernel function `jacobi_iteration` that performs a single Jacobi step. Make sure that it does not change the fixed border. Invoke the kernel function within a loop on the host with at least 1941 Jacobi steps. Sources ======= Program text from the last session: :import:session09/jacobi3.cu [fold] Generic _Makefile_ for this session: :import:session10/Makefile [fold] :navigate: up -> doc:index next -> doc:session10/page02