============================= Inline Assembler Micro Kernel [TOC] ============================= We use the __Intel SSE intrinsics guide__ and translate the micro kernel for the previous page into assembler code ourself. Declaring the assembler block as `volatile` ensures that the compiler will not reorder any instruction. This way we are in full control to optimize pipelining! Select the demo-sse-asm Branch ============================== Check out the `demo-sse-asm` branch: *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | git branch -a | | git checkout -B demo-sse-asm +++| | remotes/origin/demo-sse-asm | | | *-------------------------------------------------------------------* Then we compile the project *--[SHELL(path=ulmBLAS,height=15)]----------------------------------* | | | make | | | *-------------------------------------------------------------------* The Micro Kernel ================ We bascially just translate the intrinsics one by one into assembler instructions. Using the __Intel SSE intrinsics guide__ this is fairly easy. In the comments we have left the original intrinsics. Note that we decalre the `asm` block as volatile. Otherwise we again have the problem that the smart ass compiler reorders the instructions. The dgemm_nn Code ================= :import: ulmBLAS/src/level3/dgemm_nn.c [linenumbers] Benchmark Results ================= We run the benchmarks *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | cd bench | | ./xdl3blastst > report | | cat report | | | *-------------------------------------------------------------------* and filter out the results for the `demo-sse-asm` branch: *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | grep PASS report > demo-sse-asm | | | *-------------------------------------------------------------------* With the gnuplot script :import: ulmBLAS/bench/bench9.gps we feed gnuplot *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | gnuplot bench9.gps | | | *-------------------------------------------------------------------* and get ---- IMAGE ------------- ulmBLAS/bench/bench9.svg ------------------------ :links: Intel SSE intrinsics guide -> https://software.intel.com/sites/landingpage/IntrinsicsGuide/ :navigate: __up__ -> doc:index __back__ -> doc:page07/index __next__ -> doc:page09/index