High Performance Computing I

Session 1	First steps with vectors in C
Session 2	First steps with matrices in C
Session 3	Benchmarks and gnuplot
Session 4	Simple cache optimizations
Session 5	GEMV with fused vector operations
Session 6	GEMM: Simple block algorithm
Session 7	First steps with C++
Session 8	Cache-optimized GEMM
Session 9	Cache-optimized GEMM: More in C++ style
Session 10	Generic classes, template functions, and static polymorphism
Session 11	Function objects and lambda expressions
Session 12	Blocked GEMM for mixed element types
Session 13	First steps with threads in C++
Session 14	More on matrix classes
Session 15	Mutex and condition variables
Session 16	Thread pools (part one)
Session 17	LU factorization (part one)
Session 18	Thread pools (part two)
Session 19	LU factorization (part two)
Session 20	First steps with OpenMP
Session 21	First steps with MPI
Session 22	Transfer of vectors and matrices using MPI
Session 23	Scatter and gather operations for matrices
Session 24	Distributed GEMM operation
Session 25	First steps with CUDA
Session 26	GEMM with CUDA