====================================== Unrolled Inline Assembler Micro Kernel [TOC] ====================================== We unrole the assembler micro kernel manually. The pattern for unroling the loop is that ---- CODE(type=txt) ------------------------------------------------------------ for l = 0 to kc-1 [code for l=1] -------------------------------------------------------------------------------- get replaced by ---- CODE(type=txt) ------------------------------------------------------------ for i = 0 to kc/4-1 [code for l=4i] [code for l=4i+1] [code for l=4i+2] [code for l=4i+3] for i = kc/4 to kc [code for l=i] -------------------------------------------------------------------------------- The `[code for ..]` gets basically just copy and pasted. Select the demo-sse-asm-unrolled Branch ======================================= Check out the `demo-sse-asm-unrolled` branch: *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | git branch -a | | git checkout -B demo-sse-asm-unrolled +++| | remotes/origin/demo-sse-asm-unrolled | | | *-------------------------------------------------------------------* Then we compile the project *--[SHELL(path=ulmBLAS,height=15)]----------------------------------* | | | make | | | *-------------------------------------------------------------------* The dgemm_nn Code ================= :import: ulmBLAS/src/level3/dgemm_nn.c [linenumbers] Benchmark Results ================= We run the benchmarks *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | cd bench | | ./xdl3blastst > report | | cat report | | | *-------------------------------------------------------------------* and filter out the results for the `demo-sse-asm-unrolled` branch: *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | grep PASS report > demo-sse-asm-unrolled | | | *-------------------------------------------------------------------* With the gnuplot script :import: ulmBLAS/bench/bench10.gps we feed gnuplot *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | gnuplot bench10.gps | | | *-------------------------------------------------------------------* and get ---- IMAGE -------------- ulmBLAS/bench/bench10.svg ------------------------- :navigate: __up__ -> doc:index __back__ -> doc:page08/index __next__ -> doc:page10/index