================================================= More fine-tuning of the Unrolled Assembler Kernel [TOC] ================================================= In the last implementation the pointer increments were done direclty after each other: ---- CODE(type=c) -------------------------------------------------------------- "addq $4*32, %%rax \n\t" // A += 16; "addq $4*32, %%rbx \n\t" // B += 16; -------------------------------------------------------------------------------- These instructions compete for the same execution unit of the CPU. Pulling these apart futher improves performance. Select the demo-sse-asm-unrolled-v3 Branch ========================================== Again, we do a `make clean` before switching a branch: *--[SHELL(height=6)]------------------------------------------------* | | | cd ulmBLAS | | make clean | | | *-------------------------------------------------------------------* Then we are checking out the `demo-sse-asm-unrolled-v3` branch: *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | git branch -a | | git checkout -B demo-sse-asm-unrolled-v3 +++| | remotes/origin/demo-sse-asm-unrolled-v3 | | | *-------------------------------------------------------------------* Then we compile the project *--[SHELL(path=ulmBLAS,height=15)]----------------------------------* | | | make | | | *-------------------------------------------------------------------* The dgemm_nn Code ================= :import: ulmBLAS/src/level3/dgemm_nn.c [linenumbers] Benchmark Results ================= We run the benchmarks *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | cd bench | | ./xdl3blastst > report | | | *-------------------------------------------------------------------* and filter out the results for the `demo-sse-asm-unrolled-v3` branch: *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | grep PASS report > demo-sse-asm-unrolled-v3 | | | *-------------------------------------------------------------------* With the gnuplot script :import: ulmBLAS/bench/bench12.gps we feed gnuplot *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | gnuplot bench12.gps | | | *-------------------------------------------------------------------* and get ---- IMAGE -------------- ulmBLAS/bench/bench12.svg ------------------------- :navigate: __up__ -> doc:index __back__ -> doc:page10/index __next__ -> doc:page12/index