High Performance Computing I
First steps with vectors in C |
|
First steps with matrices in C |
|
Benchmarks and gnuplot |
|
Simple cache optimizations |
|
GEMV with fused vector operations |
|
GEMM: Simple block algorithm |
|
First steps with C++ |
|
Cache-optimized GEMM |
|
Cache-optimized GEMM: More in C++ style |
|
Generic classes, template functions, and static polymorphism |
|
Function objects and lambda expressions |
|
Blocked GEMM for mixed element types |
|
First steps with threads in C++ |
|
More on matrix classes |
|
Mutex and condition variables |
|
Thread pools (part one) |
|
LU factorization (part one) |
|
Thread pools (part two) |
|
LU factorization (part two) |
|
First steps with OpenMP |
|
First steps with MPI |
|
Transfer of vectors and matrices using MPI |
|
Scatter and gather operations for matrices |
|
Distributed GEMM operation |
|
First steps with CUDA |
|
GEMM with CUDA |