GEMM (AVX Micro Kernel)

After studying the SSE micro kernel of BLIS we apply some of the concepts to the AVX instruction set. Again you can use the Intel SSE intrinsics guide for help on AVX.

Note that all benchmarks were generated when doctool transformed the doc files to HTML. All this happened on my iMac which has a 2.7 GHz Intel i5. The theoretical peak performance of one core is 21.6 GFLOPS.