================== Prefetching Panels [TOC] ================== First Attempt ============= Check out the `demo-sse-all-asm-try-prefetching` branch: *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | git branch -a | | git checkout -B demo-sse-all-asm-try-prefetching +++| | remotes/origin/demo-sse-all-asm-try-prefetching | | | *-------------------------------------------------------------------* Then we compile the project *--[SHELL(path=ulmBLAS,height=15)]----------------------------------* | | | make | | | *-------------------------------------------------------------------* Code Modifications ------------------ - The macro kernel now also also passes pointers to the next panels of A and B to the micro kernel. - In the micro kernel we add prefetch instructions. - TODO: More details ... - TODO: Add link to course material ... :import: ulmBLAS/src/level3/dgemm_nn.c [linenumbers] Benchmark Results ----------------- We run the benchmarks *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | cd bench | | ./xdl3blastst -N 100 2000 100 > report | | | *-------------------------------------------------------------------* and filter out the results for the `demo-sse-all-asm-try-prefetching` branch: *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | grep PASS report > demo-sse-all-asm-try-prefetching | | cat demo-sse-all-asm-try-prefetching | | | *-------------------------------------------------------------------* With the gnuplot script :import: ulmBLAS/bench/bench14.gps we feed gnuplot *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | gnuplot bench14.gps | | | *-------------------------------------------------------------------* and get ---- IMAGE -------------- ulmBLAS/bench/bench14.svg ------------------------- Second Attempt: Making the Code Size of kb-Loop Body a few Bytes smaller ========================================================================= Check out the `demo-sse-all-asm-try-prefetching` branch: *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | git branch -a | | git checkout -B demo-sse-all-asm-try-prefetching-v2 +++| | remotes/origin/demo-sse-all-asm-try-prefetching-v2 | | | *-------------------------------------------------------------------* Then we compile the project *--[SHELL(path=ulmBLAS,height=15)]----------------------------------* | | | make | | | *-------------------------------------------------------------------* Code Modifications ------------------ - We replace the assembler instruction `movapd` with `movaps`. The instruction does (basically) the same thing but makes the code one byte smaller. - We also replace `addq` with `subq`. Instead of adding a positive constant we subtract its negative value. This saves another byte. - TODO: More details ... - TODO: Add link to course material ... :import: ulmBLAS/src/level3/dgemm_nn.c [linenumbers] Benchmark Results ----------------- We run the benchmarks *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | cd bench | | ./xdl3blastst -N 100 2000 100 > report | | | *-------------------------------------------------------------------* and filter out the results for the `demo-sse-all-asm-try-prefetching` branch: *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | grep PASS report > demo-sse-all-asm-try-prefetching-v2 | | cat demo-sse-all-asm-try-prefetching-v2 | | | *-------------------------------------------------------------------* With the gnuplot script :import: ulmBLAS/bench/bench15.gps we feed gnuplot *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | gnuplot bench15.gps | | | *-------------------------------------------------------------------* and get ---- IMAGE -------------- ulmBLAS/bench/bench15.svg ------------------------- Third Attempt: Further Improvements ==================================== Check out the `demo-sse-all-asm-try-prefetching` branch: *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | git branch -a | | git checkout -B demo-sse-all-asm-with-prefetching +++| | remotes/origin/demo-sse-all-asm-with-prefetching | | | *-------------------------------------------------------------------* Then we compile the project *--[SHELL(path=ulmBLAS,height=15)]----------------------------------* | | | make | | | *-------------------------------------------------------------------* Code Modifications ------------------ - We replace the assembler instruction `movapd` with `movaps`. The instruction does (basically) the same thing but makes the code one byte smaller. - We also replace `addq` with `subq`. Instead of adding a positive constant we subtract its negative value. This saves another byte. - TODO: More details ... - TODO: Add link to course material ... :import: ulmBLAS/src/level3/dgemm_nn.c [linenumbers] Final Benchmark =============== We run the benchmarks *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | cd bench | | ./xdl3blastst -N 100 2000 100 > report | | | *-------------------------------------------------------------------* and filter out the results for the `demo-sse-all-asm-try-prefetching` branch: *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | grep PASS report > demo-sse-all-asm-with-prefetching | | cat demo-sse-all-asm-with-prefetching | | | *-------------------------------------------------------------------* With the gnuplot script :import: ulmBLAS/bench/bench16.gps we feed gnuplot *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | gnuplot bench16.gps | | | *-------------------------------------------------------------------* and get ---- IMAGE -------------- ulmBLAS/bench/bench16.svg ------------------------- :navigate: __up__ -> doc:index __back__ -> doc:page12/index __next__ -> doc:page14/index