=================================== What we can learn from the Exercise [TOC] =================================== A compiler like `gcc` is an important tool for us. And in short, we can learn: - how to use such a tool, - structural understanding of what such a tool will do, can do and can not do when we use it. About using the tools ===================== - We can see the result of the *C Preprocessor (CPP)* after processing processing the source file `vec_sum.c` with `gcc -E vec_sum.c` It will print the result on the terminal (on `stdout`) by default. If you want to have the result in a file you can use the `-o` flag, e.g. `gcc -E -o ver_sum.after_cpp vec_sum.c` or redirect the output with `>`, e.g. `gcc -E vec_sum.c > vec_sum.after_cpp` By the way, other flags like `-Wall`, `-O1` etc. are in this case not relevant. These flags are only relevant for the actual C compiler, but we stop the translation process right after the preprocessing. - The result of the *C Compiler* is assembly code. Using the `-S` flag like in `gcc -S vec_sum.c` will create a file `vec_sum.s` with the produced assembly code. So by default a new file gets created. Using the `-o` flag we could specify a different name for the output, e.g. with `gcc -S -o vec_sum.asm vec_sum.c` the result gets stored in `vec_sum.asm`. In this case flags like `-Wall` or the optimization flags `-O1`, `-O2`, `-O3` will matter. In particular, using different optimization flags produces different About `vec_sum.c` ================= Looking at the C code in :import: session06/vec_sum.c [linenumbers] we see a for loop in Line 8. We know that the compiler actually sees the code that was first processed by the preprocessor, that is ---- SHELL(path=session06,hostname=heim) --------------------------------------- gcc -E vec_sum.c -------------------------------------------------------------------------------- How a C compiler for the ULM could translate the C Code ------------------------------------------------------- It is important that we consider the result of the preprocessor when we think about how the C code could be translated into assembly code for the ULM. The result could be something like this: :import: session06/ulm/vec_sum_ulm.s Structure of assembly code produced by gcc ------------------------------------------ So the loop the compiler sees is `for (register unsigned long long i=0; i<5; ++i) { ... }`. Hence, if the C compiler would produce assembly code for the ULM we would expect that it had a structure like this: ---- CODE(type=s) -------------------------------------------------------------- ... .text ... # init a result-register res with 0 # init a counter-register i with 0 jmp check loop: # add somehow x[i] to the result-register res # increment the counter-register i check: # subtract 5 from the counter-register i (i.e. compute i-5) jb loop ... # halt with result -------------------------------------------------------------------------------- Now we generate the assembly code with ---- SHELL(path=session06,hostname=heim) --------------------------------------- gcc -Wall -std=c11 -S -fno-asynchronous-unwind-tables -fomit-frame-pointer +++ vec_sum.c -------------------------------------------------------------------------------- and have a closer look: :import: session06/vec_sum.s [fold] We guessed a few things right. In the following extract the observed structure is pointed out in the comments: ---- CODE(type=s) -------------------------------------------------------------- .text ... main: movl $0, %ebp # init a result-register res with 0 movl $0, %ebx # init a counter-register i with 0 jmp .L2 # jmp check .L3: # loop: movzbl x(%rbx), %eax movzbl %al, %eax addq %rax, %rbpa # add x[i] to the result-register addq $1, %rbx # increment the counter-register i .L2: # check: cmpq $4, %rbx # subtract 4 from counter-regsiter jbe .L3 # jbe loop movl %ebp, %eax popq %rbx popq %rbp ret -------------------------------------------------------------------------------- However, the compiler treats the loop as the equivalent loop `for (register unsigned long long i=0; i<=4; ++i)`. Of course we also observe that there are some details that we can not understand at the moment as they are Intel64 specific. For example the counter register is sometimes denoted as `%ebx` and sometimes as `%rbx`. Influence of optimization flag `-O1` ------------------------------------ ---- SHELL(path=session06,hostname=heim) --------------------------------------- gcc -Wall -std=c11 -S -fno-asynchronous-unwind-tables -fomit-frame-pointer +++ -O1 -o vec_sum_opt1.s vec_sum.c -------------------------------------------------------------------------------- :import: session06/vec_sum_opt1.s [fold] With the `-O1` optimization the loop was realized different. The assembly code has no longer the unconditional jump. Instead of the unconditional `jbe` is uses a `jne` (which is equivalent to `jnz`). The code has following structure: ---- CODE(type=s) -------------------------------------------------------------- .text .globl main .type main, @function main: movl $0, %edx # init a result-register res with 0 movl $0, %eax # init a counter-register i with 0 .L2: # loop: movzbl x(%rdx), %ecx addq %rcx, %rax # add x[i] to the result-register addq $1, %rdx # increment the counter-register i cmpq $5, %rdx # subtract 5 from counter-regsiter jne .L2 # jne loop ... -------------------------------------------------------------------------------- In C we would express the structure of the loop as ---- CODE(type=c) -------------------------------------------------------------- res = 0; do { res += x[i]; i = i+1; } while (i!=5); -------------------------------------------------------------------------------- Even if you don't now the do-while loop of C, the difference to for loops and while loops is that the loop body is always executed at least once. And this property can be seen from the assembly code. Influence of optimization flag `-O2` ------------------------------------ ---- SHELL(path=session06,hostname=heim) --------------------------------------- gcc -Wall -std=c11 -S -fno-asynchronous-unwind-tables -fomit-frame-pointer +++ -O2 -o vec_sum_opt2.s vec_sum.c -------------------------------------------------------------------------------- Execept for some minor details (that in this case really don't matter) the `-O2` optimization gives the same code: :import: session06/vec_sum_opt2.s [fold] Influence of optimization flag `-O3` (and your discovery of loop unrolling) --------------------------------------------------------------------------- ---- SHELL(path=session06,hostname=heim) --------------------------------------- gcc -Wall -std=c11 -S -fno-asynchronous-unwind-tables -fomit-frame-pointer +++ -O3 -o vec_sum_opt3.s vec_sum.c -------------------------------------------------------------------------------- Now this is interesting! There are no jump instructions at all: :import: session06/vec_sum_opt3.s [fold] The code just fetches data from the memory and adds it: ---- CODE(type=s) -------------------------------------------------------------- movzbl x+1(%rip), %eax movzbl x(%rip), %edx addq %rax, %rdx movzbl x+2(%rip), %eax addq %rax, %rdx movzbl x+3(%rip), %eax addq %rax, %rdx movzbl x+4(%rip), %eax addq %rdx, %rax -------------------------------------------------------------------------------- We would expect such a code structure from C code like this: ---- CODE(type=c) -------------------------------------------------------------- res = x[0]; res = res + x[1]; res = res + x[2]; res = res + x[3]; res = res + x[4]; return res; -------------------------------------------------------------------------------- And that is how the compiler treated the loop. This is called *loop unrolling*. About `for_loop.c` ================= :import: session06/for_loop.c [linenumbers] ---- SHELL(path=session06,hostname=heim) --------------------------------------- gcc -E for_loop.c -------------------------------------------------------------------------------- How a C compiler for the ULM could translate the C Code ------------------------------------------------------- :import: session06/ulm/for_loop_ulm.s Structure of the produced assembly code --------------------------------------- ---- SHELL(path=session06,hostname=heim) --------------------------------------- gcc -Wall -std=c11 -S -fno-asynchronous-unwind-tables -fomit-frame-pointer +++ for_loop.c -------------------------------------------------------------------------------- You see again the structure of the for loop: :import: session06/for_loop.s Influence of optimization flag `-O1` (and your discovery of Compile time function execution) -------------------------------------------------------------------------------------------- ---- SHELL(path=session06,hostname=heim) --------------------------------------- gcc -Wall -std=c11 -S -fno-asynchronous-unwind-tables -fomit-frame-pointer +++ -O1 -o for_loop_opt1.s for_loop.c -------------------------------------------------------------------------------- In this example we already achieve the best possible optimization with `-O1` (but see for your self that `-O2` and `-O3` will produce the same result). The compiler already computed the factorial *at compile time* as you can see for yourself. The assembly code contains the result $5! = 120$: :import: session06/for_loop_opt1.s About `while_loop.c` (and your discovery of how the preprocessor can be confusing) ================================================================================== This is an example for a not uncommon problem for beginners in C. The programmer uses `N` as if it was a variable. However, as `N` is expanded by the preprocessor it is actually a literal integer constant: ---- SHELL(path=session06,hostname=heim) --------------------------------------- gcc -E while_loop.c -------------------------------------------------------------------------------- So it is understandable that the compiler complains _I can not decrement the literal 5_. However, the compiler uses the term `lvalue`. This is a term denotes in programming an expression that can stand on the *left-hand side* of an assignment (and therefore it value can be changed): ---- SHELL(path=session06,hostname=heim) --------------------------------------- gcc -Wall -std=c11 while_loop.c -------------------------------------------------------------------------------- A simple fix could be to change `N` into a variable: ---- CODE(type=c) -------------------------------------------------------------- unsigned long long N = 5; int main() { register unsigned long long res = 1; while (N>1) { res *= N; --N; } return res; } --------------------------------------------------------------------------------