High Performance Computing I

Session 1

First steps with vectors in C

Session 2

Matrices in full storage format

Session 3

Benchmarking and Gnuplot

Session 4

Simple cache optimizations

Session 5

GEMV with fused vector operations

Session 6

GEMM: Error estimator and simple block algorithm

Session 7

First steps with C++

Session 8

  • C++ tool for buffers

  • packing of matrix blocks

Session 9

  • GEMM Micro Kernel

  • GEMM Macro Kernel

  • GEMM Frame Operation

Session 10

Generic classes, template functions, and static polymorphism

Session 11

Function objects and lambda expressions

Session 12

  • Fine tuning the GEMM performance

  • Boosting the GEMM performance with optimized micro kernels

  • Dynamic configuration of GEMM

  • Supporting different element types

Session 13

First steps with threads in C++

Session 14

More on matrix classes

Session 15

Mutex and condition variables

Session 16

Thread pools (part one)

Session 17

Thread pools (part two)

Session 18

  • Unblocked LU factorization

  • Blocked LU factorization

Session 19

First steps with OpenMP

Session 20

First steps with MPI

Session 21

Transfer of vectors and matrices using MPI

Session 22

Scatter and gather operations for matrices

Session 23

  • Wrapper for scatter and gather operations for matrices

  • (Simple) GEMM with MPI

Session 24

First steps with CUDA

Session 25

Second steps with CUDA

Session 26

A smooth introduction to the multigrid method