High Performance Computing I

Session 1

First steps with vectors in C

Session 2

First steps with matrices in C

Session 3

Benchmarks and gnuplot

Session 4

Simple cache optimizations

Session 5

GEMV with fused vector operations

Session 6

GEMM: Simple block algorithm

Session 7

First steps with C++

Session 8

Cache-optimized GEMM

Session 9

Cache-optimized GEMM: More in C++ style

Session 10

Generic classes, template functions, and static polymorphism

Session 11

Function objects and lambda expressions

Session 12

Blocked GEMM for mixed element types

Session 13

First steps with threads in C++

Session 14

More on matrix classes

Session 15

Mutex and condition variables

Session 16

Thread pools (part one)

Session 17

LU factorization (part one)

Session 18

Thread pools (part two)

Session 19

LU factorization (part two)

Session 20

First steps with OpenMP

Session 21

First steps with MPI

Session 22

Transfer of vectors and matrices using MPI

Session 23

Scatter and gather operations for matrices

Session 24

Distributed GEMM operation

Session 25

First steps with CUDA

Session 26

GEMM with CUDA