============================= AVX-Kernel mit Loop-Unrolling [TOC] ============================= Durch Loop-Unrolling kann die Effizienz nochmal etwas gesteigert werden. Der Assembler-Code folgt weiter unten. ---- SHELL(path=session08/example03/hpc_project/ulmblas/,hostname=heim,hide) --- make clean -------------------------------------------------------------------------------- ---- SHELL(path=session08/example03/hpc_project/bench/,hostname=heim,hide) ----- make clean cp ../../../example02/hpc_project/bench/report* . -------------------------------------------------------------------------------- `ulmBLAS` muss neu erzeugt werden: ---- SHELL(path=session08/example03/hpc_project/ulmblas,hostname=heim) --------- pwd make -------------------------------------------------------------------------------- Und danach der Benchmark neu übersetzt werden: ---- SHELL(path=session08/example03/hpc_project/bench,hostname=heim) ----------- pwd make ./bench_dgemm_4_8 > report.gemm_avx_unrolled gnuplot plot.gemm -------------------------------------------------------------------------------- ---- IMAGE --------------------------------------------- session08/example03/hpc_project/bench/bench.gemm_ccc.svg -------------------------------------------------------- Assembler-Code ============== Wir ersetzen `ulmblas/ugemm_4_8.s` mit `ulmblas/ugemm_4_8_unrolled.s`: :import: session08/example03/hpc_project/ulmblas/ugemm_4_8_unrolled.s :navigate: up -> doc:index back -> doc:session08/page02