=========================== Naive Use of SSE Intrinsics [TOC] =========================== The implementation presented here uses __SSE intrinsics__. In my naive way of thinking I expected a compiler to produce this implementation on assembly level when optimizing the `demo-pure-c` micro kernel. However, no matter what attributes, optimization flags and tricks I used, the compiler never could optimize the `demo-pure-c` micro kernel to the performance level of this micro kernel. Select the demo-naive-sse-with-intrinsics Branch ================================================ Check out the `demo-naive-sse-with-intrinsics` branch: *--[SHELL(path=ulmBLAS)]--------------------------------------------* | | | git branch -a | | git checkout -B demo-naive-sse-with-intrinsics +++| | remotes/origin/demo-naive-sse-with-intrinsics | | | *-------------------------------------------------------------------* Then we compile the project *--[SHELL(path=ulmBLAS,height=15)]----------------------------------* | | | make | | | *-------------------------------------------------------------------* The Micro Kernel Algorithm ========================== We merely optimize the update step ---- LATEX --------------------------------------------------------------------- \mathbf{AB} \leftarrow \mathbf{AB} + \begin{pmatrix} a_{4l} \\ a_{4l+1} \\ a_{4l+2} \\ a_{4l+3}\end{pmatrix} \begin{pmatrix} b_{4l}, & b_{4l+1}, & b_{4l+2}, & b_{4l+3}\end{pmatrix} -------------------------------------------------------------------------------- by using SSE intrinsics. Looking at the original C code ---- CODE(type=c) ------------------------ for (l=0; l report | | cat report | | | *-------------------------------------------------------------------* and filter out the results for the `demo-naive-sse-with-intrinsics` branch: *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | grep PASS report > demo-naive-sse-with-intrinsics | | | *-------------------------------------------------------------------* With the gnuplot script :import: ulmBLAS/bench/bench3.gps we feed gnuplot *--[SHELL(path=ulmBLAS/bench)]--------------------------------------* | | | gnuplot bench3.gps | | | *-------------------------------------------------------------------* and get ---- IMAGE ------------- ulmBLAS/bench/bench3.svg ------------------------ :links: SSE intrinsics -> https://software.intel.com/sites/landingpage/IntrinsicsGuide/ :navigate: __up__ -> doc:index __back__ -> doc:page02/index __next__ -> doc:page04/index