============== Loop Unrolling [TOC] ============== We optimize our previous implementation by manual loop unrolling. Again, the performance achieved here I would have expected from a smart compiler for the `demo-pure-c` branch. At least if we provide enough hints through compiler flags and attributes. Select the demo-naive-sse-with-intrinsics-unrolled Branch ========================================================= Check out the `demo-naive-sse-with-intrinsics` branch: *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | git branch -a | | git checkout -B demo-naive-sse-with-intrinsics-unrolled +++| | remotes/origin/demo-naive-sse-with-intrinsics-unrolled | | | *-------------------------------------------------------------------* Then we compile the project *--[SHELL(path=ulmBLAS,height=15)]----------------------------------* | | | make | | | *-------------------------------------------------------------------* Unrolling ========= TODO: - Add some of the lecture notes about pipelining, branch prediction and prefecting. - Add pictures on how we realize this in the implementation. The dgemm_nn Code ================= :import: ulmBLAS/src/level3/dgemm_nn.c [linenumbers] Benchmark Results ================= We run the benchmarks *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | cd bench | | ./xdl3blastst > report | | cat report | | | *-------------------------------------------------------------------* and filter out the results for the `demo-naive-sse-with-intrinsics-unrolled` branch: *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | grep PASS report > demo-naive-sse-with-intrinsics-unrolled | | | *-------------------------------------------------------------------* With the gnuplot script :import: ulmBLAS/bench/bench4.gps we feed gnuplot *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | gnuplot bench4.gps | | | *-------------------------------------------------------------------* and get ---- IMAGE ------------- ulmBLAS/bench/bench4.svg ------------------------ :navigate: __up__ -> doc:index __back__ -> doc:page03/index __next__ -> doc:page05/index