High Performance Computing I
First steps with vectors in C |
|
First steps with matrices in C |
|
|
|
Simple cache optimizations |
|
Simple cache optimizations for GEMM |
|
Cache optimizations for GEMV |
|
First steps with C++ |
|
|
|
Packing matrix blocks for an efficient GEMM (matrix product) implementation. |
|
|
|
Generic classes, template functions, and static polymorphism |
|
Function objects and lambda expressions |
|
Unblocked LU factorization |
|
More on vector and matrix classes |
|
First steps with threads in C++ |
|
Mutex and condition variables |
|
Thread pools (part one) |
|
Thread pools (part two) |
|
GEMM with AVX-optimized micro kernels |
|
|
|
|
|
Introduction to OpenMP |
|
Introduction to MPI |
|
Transfer of vectors and matrices using MPI |
|
Scatter and gather operations, asynchronous communication, and two-dimensional grids |
|
Distributed matrices (with scatter and gather operations) |
|
Distributed GEMM |
|
Introduction to CUDA |
|
Matrices and block-wise operations on GPUs |
|
Global synchronization and two-dimensional aggregation |
|
Multigrid solver (Part 1) |
|
Multigrid solver (Part 2) |