====================================== Programs with functions and procedures ====================================== From now on we will no longer have to use C code as kind of a pseudocode description for our assembler programs. Instead we can use the C code as an equivalent, more convenient formulation of our programs. Following some rules the C code can be rewritten (or compiled) into assembly code. Later this will be done by a tool, a C compiler, at the moment you have to do it. By doing so you learn the C programming language by example and you will get an idea how a C compiler works. Function `main` and the subprogram `_start` =========================================== By convention every program has a function `main` that returns an integer. When a program gets executed it behaves as if function `main` is the first function that gets executed. The return value of `main` defines the exit code of the program. If `main` has no explicit return statement this function returns 0 by default. Note that such a default return value is only guarantied for function `main`. For example, the following program only defines a function `main` which explicitly returns 42: ---- CODE (file=session10/func/main.c) ----------------------------------------- int main() { return 42; } -------------------------------------------------------------------------------- Compile this and check the exit code: ---- SHELL (path=session10/func) ----------------------------------------------- gcc -o main main.c main; echo $? -------------------------------------------------------------------------------- In this even more minimalistic program function main has an empty function body, i.e. ---- CODE (file=session10/func/main_no_return.c) ------------------------------- int main() { } -------------------------------------------------------------------------------- By convention function `main` it implicitly returns 0: ---- SHELL (path=session10/func) ----------------------------------------------- gcc -o main_no_return main_no_return.c main_no_return; echo $? -------------------------------------------------------------------------------- The subprogram `_start` ~~~~~~~~~~~~~~~~~~~~~~~ In C you are not allowed to use the identifier `_start` for a function, this identifier is reserved. The gory details are that in general a program has to communicate with the operating system for receiving arguments and returning an exit code. This communication happens through so called __system calls__, and different operating systems have different system calls even if they run on the same hardware. So `_start` is reserved for being the name of a function that actually gets called first and itself calls function `main`. Platform depended system calls can be done in `_start` before and after `main` gets called. In the above example the implementation of function `_start` was added by the linker. With the command `nm` you can display the entries of a program's symbol table. You can use `nm` to see that the compiled programs have the symbol `_start`: ---- SHELL (path=session10/func) ----------------------------------------------- nm main | grep start -------------------------------------------------------------------------------- The ULM does not have an operating system but we use `_start` to guarantee that the stack is initialized before function `main` gets called. ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \SetMargin{5}{10}{0}{3} \renewcommand\BoxWidth {8} \PutStatement{0}{init stack} \PutLabel{0}{\_start} \PutCallStatement[red!50]{1}{ call main } \PutStatement{2}{halt with return value of main} \AddPath{0}{1} \AddPath{1}{2} \PutAnnotation{0}{\text{Entry point}} % next flow chart column \renewcommand\FlowCol{1} \PutStatement{0}{\dots} \PutLabel{0}{main} \PutStatement{1}{\dots} \PutStatement{2}{return exit code} \AddPath{0}{1} \AddPath{1}{2} \DrawCallPointer[red!50]{0}{1}{1}{0} \DrawReturnPointer[red!50]{0}{1}{1}{2} \end{tikzpicture} -------------------------------------------------------------------------------- Equivalent assembly programs ---------------------------- The following assembly program is equivalent to the above C program in `main.c` (which was returning 42 in function `main`). It also contains the implementation of the `_start` function. Until the linker gets covered in the next session we write all into the same file but in a way that we later can split this single source file into separate compile units. Hence for each function the directives for arguments etc. are repeated: :import: session10/func/main.s [fold] Use the debugger to see what is going on behind behind the scene when you run the program: ---- SHELL (path=session10/func) ----------------------------------------------- ulmas -o main main.s ulm main; echo $? -------------------------------------------------------------------------------- Example with a procedure ======================== This C program implements it's own `puts` function for printing a string and uses a function `putchar` for printing a single character: ---- CODE (file=session10/func/example_puts.c) --------------------------------- void puts(char *str) { while (*str) { putchar(*str++); } } int main() { puts("hello, world!\n"); } -------------------------------------------------------------------------------- The function body of procedure `puts` can be described by a flow chart and it can accordingly be rewritten with __spaghetti code__ (i.e. it contains `goto` statements) for reflecting more closely how you can implement it in assembly: + ---- CODE (type=c) ----------------------------------------------------------- while (*str) { putchar(*str++); } ------------------------------------------------------------------------------ + ---- TIKZ -------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \renewcommand\BoxWidth {5} \renewcommand\BoxHeight {1.2} \renewcommand\BoxDistance {1.8} %\SetMargin{5}{10}{0}{4} \PutJump{0}{*str == 0} \PutStatement{1}{putchar(*str);} \PutStatement{2}{str = str + 1;} \PutJump{3}{} \PutStatement{4}{/* empty statement*/ ;} \PutLabel{0}{puts\_while} \PutLabel{4}{puts\_done} \AddPath{0}{1} \AddPath{1}{2} \AddPath{2}{3} \AddPath{3}{4} \AddCondJumpPath{0}{4} \AddJumpPathLeft{3}{0} \end{tikzpicture} ------------------------------------------------------------------------------ + ---- CODE (type=c) ----------------------------------------------------------- puts_while: if (*str == 0) goto puts_done; putchar(*str); str = str + 1; goto puts_while; puts_done: ; ------------------------------------------------------------------------------ For deriving an assembly implementation we actually consider this __spaghetti code__ C program: ---- CODE (file=session10/func/example_puts_archaic.c) ------------------------- void puts(char *str) { puts_while: if (*str == 0) goto puts_done; putchar(*str); str = str + 1; goto puts_while; puts_done: ; } int main() { puts("hello, world!\n"); } -------------------------------------------------------------------------------- Before going into the syntactical meaning of this code let's compile both and check that you get the same result: ---- SHELL (path=session10/func) ----------------------------------------------- gcc -w -o example_puts example_puts.c example_puts gcc -w -o example_puts_archaic example_puts_archaic.c example_puts_archaic -------------------------------------------------------------------------------- Step by step explanation what this C code describes --------------------------------------------------- Let's begin with what you see in function `main`: ---- CODE (type=c) ------------------------------------------------------------- int main() { puts("hello, world!\n"); } -------------------------------------------------------------------------------- Function `main` has no local variables and is implicitly returning zero. So applying the standard recipe we already have the following skeleton: + ---- CODE (type=c) ----------------------------------------------------------- int main() { /* statements */ return 0; } ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- // ... directives for arguments, locals, etc. .text main: // ... function prologue ... movq %RET, ret(%SP) movq %FP, fp(%SP) addq 0, %SP, %FP // reserve space for local variables. subq 0*8, %SP, %SP /* statements */ /* return 0; */ ldzwq 0, %4 movq %4, rval(%FP) // ... function epilogue ... addq 0, %FP, %SP movq fp(%SP), %FP movq ret(%SP), %RET jmp %RET, %0 ------------------------------------------------------------------------------ Whenever you see a string literal in a C program it means the compiler will generate in the data segment a corresponding `.string` directive with a unique label. So for before we consider how to call `puts` we generate the string with some unique label: + ---- CODE (type=c) ----------------------------------------------------------- "hello, word!\n" -------------------------------------------------------------------------------- + ---- CODE (type=s) ----------------------------------------------------------- .data .main.L0: // some unique label .string "hello, word!\n" ------------------------------------------------------------------------------ When you pass a string to a function you only pass a pointer to the string as argument. Function `puts` does not return a value, hence we applying the recipe for calling a procedure with one argument (which is the pointer to the string literal) we have + ---- CODE (type=c) ------------------------------------------------------------- puts("hello, world!\n"); -------------------------------------------------------------------------------- + ---- CODE (type=s) ----------------------------------------------------------- /* puts("hello, word!\n"); */ subq 24, %SP, %SP ldzwq .main.L0, %4 movq %4, proc_arg0(%SP) ldzwq puts, %4 jmp %4, %RET addq 24, %SP, %SP ------------------------------------------------------------------------------ So in total we have + ---- CODE (type=c) ----------------------------------------------------------- int main() { puts("hello, world!\n"); return 0; } ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- // ... directives for arguments, locals, etc. .data .main.L0: // some unique label .string "hello, word!\n" .text main: // ... function prologue ... movq %RET, ret(%SP) movq %FP, fp(%SP) addq 0, %SP, %FP // reserve space for local variables. subq 0*8, %SP, %SP /* puts("hello, word!\n"); */ subq 24, %SP, %SP ldzwq .main.L0, %4 movq %4, proc_arg0(%SP) ldzwq puts, %4 jmp %4, %RET addq 24, %SP, %SP /* return 0; */ ldzwq 0, %4 movq %4, rval(%FP) // ... function epilogue ... addq 0, %FP, %SP movq fp(%SP), %FP movq ret(%SP), %RET jmp %RET, %0 ------------------------------------------------------------------------------ Next comes the implementation of `puts` which is described by: ---- CODE (type=c) ------------------------------------------------------------- void puts(char *str) { while (*str) { putchar(*str++); } } -------------------------------------------------------------------------------- This procedure expects one parameter `str` which should be a pointer to a character, i.e. `str` should be the address of the string's first character. In C terminology that is expressed by declaring `str` as a variable of type `char *`, and you see this declaration (bookkeeping information) in the lines ---- CODE (type=c) ------------------------------------------------------------- void puts(char *str ) -------------------------------------------------------------------------------- When you use the variable in a statement `*str` denotes the value at the end of the pointer, in this case a character. We can now dig into the details of implementing function `puts`. It does not return a value so we use the recipe for implementing a procedures: + ---- CODE (type=c) ----------------------------------------------------------- void puts(char *str) { /* .... */ } ------------------------------------------------------------------------------ + ---- CODE (type=s) ------------------------------------------------------------- // ... directives for arguments, locals, etc. .equ str, proc_arg0 .text puts: // function prologue movq %RET, ret(%SP) movq %FP, fp(%SP) addq 0, %SP, %FP // reserve space for local variables. subq 0*8, %SP, %SP // begin of the function body /* Implementation of the function or procedure */ // end of the function body // function epilogue addq 0, %FP, %SP movq fp(%SP), %FP movq ret(%SP), %RET jmp %RET, %0 -------------------------------------------------------------------------------- The implementation of the body was described by the flow chart above. And we implement this chart node by node. Conditional jump `if (*str == 0)` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ One thing I dislike in C, and many other programming languages, is that the comparison for equality is expressed with "`==`" and the assignment with "`=`". I would prefer a single equal sign for comparison and something like "`:=`" for assignments. But knowing the meaning of "`==`" in ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \renewcommand\BoxWidth {5} \renewcommand\BoxHeight {1.4} \SetMargin{5}{10}{0}{0} \PutJump{0}{*str == 0} \PutLabel{0}{puts\_while} \end{tikzpicture} -------------------------------------------------------------------------------- is that `*str` (the character at the end of pointer `str`) is compared with zero. So you first have to load the pointer `str`, i.e. the address stored in the argument `str` into a register ---- CODE (type=s) ------------------------------------------------------------- movq str(%FP), %4 -------------------------------------------------------------------------------- and then the character at the end of the pointer: ---- CODE (type=s) ------------------------------------------------------------- movzbq (%4), %4 -------------------------------------------------------------------------------- after that you can check if `%4` contains zero and jump in that case. The code for this node is + ---- TIKZ -------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \renewcommand\BoxWidth {5} \renewcommand\BoxHeight {1.4} %\SetMargin{5}{10}{0}{0} \PutJump{0}{*str == 0} \PutLabel{0}{puts\_while} \end{tikzpicture} ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- puts_while: if (*str == 0) goto puts_done; ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- puts_while: movq str(%FP), %4 movzbq (%4), %4 subq 0, %4, %0 jz puts_done ------------------------------------------------------------------------------ the label `puts_while` is needed so that we can later jump back to it, and `puts_done` is a label to jump to the end of the function body. Printing the character `*str` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It would be a shame to call here a function `putchar` to print a single character. So we just load `*str` and use the `putc` instruction: + ---- TIKZ -------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \renewcommand\BoxWidth {5} \renewcommand\BoxHeight {1.4} %\SetMargin{5}{10}{0}{0} \PutStatement{0}{putchar(*str);} \end{tikzpicture} ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- putchar(*str); ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- movq str(%FP), %4 movzbq (%4), %4 putc %4 ------------------------------------------------------------------------------ Of course it is also a shame that we reload `*str` into `%4`. Because before `*str` was already fetched into `%4`. But at the moment we just blindly implement the flow chart without much thinking. We can care about optimizations another time. Incrementing the pointer `str` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now we increment `str` so that it points to the next character + ---- TIKZ -------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \renewcommand\BoxWidth {5} \renewcommand\BoxHeight {1.4} %\SetMargin{5}{10}{0}{0} \PutStatement{0}{str = str + 1;} \end{tikzpicture} ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- str = str + 1; ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- movq str(%FP), %4 addq 1, %4, %4 movq %4, str(%FP) ------------------------------------------------------------------------------ Unconditional jump and label to break the loop ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The unconditional jump is a single instruction + ---- TIKZ -------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \renewcommand\BoxWidth {5} \renewcommand\BoxHeight {1.4} %\SetMargin{5}{10}{0}{0} \PutJump{0}{} \end{tikzpicture} ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- goto puts_while; ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- jmp puts_while ------------------------------------------------------------------------------ and the empty statement is just a label after the unconditional jump + ---- TIKZ -------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \renewcommand\BoxWidth {5} \renewcommand\BoxHeight {1.4} %\SetMargin{5}{10}{0}{0} \PutStatement{0}{/* empty statement*/ ;} \PutLabel{0}{puts\_done} \end{tikzpicture} ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- puts_done: ; ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- puts_done: ------------------------------------------------------------------------------ Hence in total we have + ---- CODE (type=c) ----------------------------------------------------------- void puts(char *str) { while (*str) { putchar(*str++); } } ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- .equ str, proc_arg0 .text puts: // function prologue movq %RET, ret(%SP) movq %FP, fp(%SP) addq 0, %SP, %FP // begin of the function body puts_while: /* if (*str == 0) goto puts_done; */ movq str(%FP), %4 movzbq (%4), %4 subq 0, %4, %0 jz puts_done /* putchar(*str); */ movq str(%FP), %4 movzbq (%4), %4 putc %4 /* str = str + 1; */ movq str(%FP), %4 addq 1, %4, %4 movq %4, str(%FP) /* goto puts_while; */ jmp puts_while puts_done: // end of the function body // function epilogue addq 0, %FP, %SP movq fp(%SP), %FP movq ret(%SP), %RET jmp %RET, %0 ------------------------------------------------------------------------------ Complete assembly implementation for this C code ------------------------------------------------ :import: session10/func/example_puts.s [fold] ---- SHELL (path=session10/func) ----------------------------------------------- ulmas -o example_puts example_puts.s ulm example_puts -------------------------------------------------------------------------------- Example with a function ======================= This C program implements it's own `strlen` function for determining the length of a string. For the sake of simplicity the program just returns the string length in main: ---- CODE (file=session10/func/example_strlen.c) ------------------------------- typedef unsigned long uint64_t; // unsigned long is 64 bit wide on theon uint64_t strlen(char *str) { char *ch = str; while (*ch) { ++ch; } return ch - str; } int main() { return strlen("hello, world!\n"); } -------------------------------------------------------------------------------- ---- SHELL (path=session10/func) ----------------------------------------------- gcc -o example_strlen example_strlen.c example_strlen; echo $? -------------------------------------------------------------------------------- Step by step explanation what this C code describes --------------------------------------------------- We again begin with function `main`: ---- CODE (type=c) ------------------------------------------------------------- int main() { return strlen("hello, world!\n"); } -------------------------------------------------------------------------------- As the code contains a string literal we choose some unique label and generate the string in the data segment. For function `main` we use the skeleton for a function: + ---- CODE (type=c) ----------------------------------------------------------- int main() { /* statements */ } ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- // ... directives for arguments, locals, etc. .text main: // ... function prologue ... movq %RET, ret(%SP) movq %FP, fp(%SP) addq 0, %SP, %FP // reserve space for local variables. subq 0*8, %SP, %SP /* statements */ // ... function epilogue ... addq 0, %FP, %SP movq fp(%SP), %FP movq ret(%SP), %RET jmp %RET, %0 ------------------------------------------------------------------------------ Function `main` only contains a `return` statement. The return value is an expression which in turn is defined as the return value of a function call. So we call function `strlen`, store the return value on the stack and jump to the epilogue of `main`: + ---- CODE (type=c) ----------------------------------------------------------- return strlen("hello, world!\n"); ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- /* return strlen("hello, word!\n"); */ subq 32, %SP, %SP ldzwq .main.L0, %4 movq %4, func_arg0(%SP) ldzwq strlen, %4 jmp %4, %RET movq rval(%SP), %4 movq %4, rval(%FP) addq 32, %SP, %SP // define a label before the epilogue jmp .main.leave ------------------------------------------------------------------------------ Note that in this case the jump to the epilogue is an unnecessary instruction as the epilogue follows immediately. You see that when we put things together: + ---- CODE (type=c) ----------------------------------------------------------- int main() { return strlen("hello, world!\n"); } ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- // ... directives for arguments, locals, etc. .text main: // ... function prologue ... movq %RET, ret(%SP) movq %FP, fp(%SP) addq 0, %SP, %FP // reserve space for local variables. subq 0*8, %SP, %SP /* return strlen("hello, word!\n"); */ subq 32, %SP, %SP ldzwq .main.L0, %4 movq %4, func_arg0(%SP) ldzwq strlen, %4 jmp %4, %RET movq rval(%SP), %4 movq %4, rval(%FP) addq 32, %SP, %SP // define a label before the epilogue jmp .main.leave .main.leave: // ... function epilogue ... addq 0, %FP, %SP movq fp(%SP), %FP movq ret(%SP), %RET jmp %RET, %0 ------------------------------------------------------------------------------ In general a `return` statement can occur in the middle of a compound statement. Using this "jump to the epilogue pattern" allows us to implement `return` statement in a mindless way that always works. For example in cases like this: ---- CODE (type=c) ------------------------------------------------------------- if (condition) { return a; } else { return b; } -------------------------------------------------------------------------------- Again, optimizing the assembly code is something we can do afterwards. First of all we need something that just does the job, and a method to derive such a working solution. But let's not digress, the next thing where we need a working solution is function `strlen`. In a first step we just care about the coarse structure, i.e. what arguments and local variables does the functions. The initialization of local variable is uninteresting in this case so we rewrite: + ---- CODE (type=c) ----------------------------------------------------------- uint64_t strlen(char *str) { char *ch = str; while (*ch) { ++ch; } return ch - str; } ------------------------------------------------------------------------------ + ---- CODE (type=c) ----------------------------------------------------------- uint64_t strlen(char *str) { char *ch; ch = str; while (*ch) { +ch; } return ch - str; } ------------------------------------------------------------------------------ Now it is more obvious that the function has one argument (named `str`) and one local variable (named `ch`): + ---- CODE (type=c) ----------------------------------------------------------- uint64_t strlen(char *str) { char *ch; /* statements */ } ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- // ... directives for arguments, locals, etc. .equ str, func_arg0 .equ ch, local0 .text puts: // function prologue movq %RET, ret(%SP) movq %FP, fp(%SP) addq 0, %SP, %FP // reserve space for local variables. subq 1*8, %SP, %SP // for ch // begin of the function body /* Implementation of the function or procedure */ // end of the function body // function epilogue addq 0, %FP, %SP movq fp(%SP), %FP movq ret(%SP), %RET jmp %RET, %0 ------------------------------------------------------------------------------ In your bookkeeping you have to note that variables `str` and `ch` are supposed to be pointer to a character. That means each of these variables stores a 64 bit address of a character. That means `*str` and `*ch` in the C code refer to a character at the end of the pointer. Initialization of the local variable ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We simply fetch the value of variable `str` and store it at the memory location for variable `ch`. The bookkeeping note tell us that both variables have the size of 8 bytes, so we use `movq` for fetching and storing: + ---- CODE (type=c) ----------------------------------------------------------- ch = str; ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- /* ch = str */ movq str(%FP), %4 movq %4, ch(%FP) ------------------------------------------------------------------------------ While loop ~~~~~~~~~~ We rewrite the `while` loop with spaghetti code and also resolve the meaning of the post-increment: + ---- CODE (type=c) ----------------------------------------------------------- while (*ch) { ++ch; } ------------------------------------------------------------------------------ + ---- TIKZ -------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \renewcommand\BoxWidth {5} \renewcommand\BoxHeight {1.2} \renewcommand\BoxDistance {1.8} %\SetMargin{5}{10}{0}{4} \PutJump{0}{*ch == 0} \PutStatement{1}{ch = ch + 1;} \PutJump{2}{} \PutStatement{3}{/* empty statement*/ ;} \PutLabel{0}{strlen\_while} \PutLabel{3}{strlen\_done} \AddPath{0}{1} \AddPath{1}{2} \AddPath{2}{3} \AddCondJumpPath{0}{3} \AddJumpPathLeft{2}{0} \end{tikzpicture} ------------------------------------------------------------------------------ + ---- CODE (type=c) ----------------------------------------------------------- strlen_while: if (*ch == 0) goto strlen_done; ch = ch + 1; goto strlen_while; strlen_done: ; ------------------------------------------------------------------------------ Looking at the bookkeeping notes we recall that `*ch` refers to the byte with the address stored in the local variable `ch`. This byte gets compared against zero in the conditional jump: + ---- TIKZ -------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \renewcommand\BoxWidth {5} \renewcommand\BoxHeight {1.2} \renewcommand\BoxDistance {1.8} %\SetMargin{5}{10}{0}{4} \PutJump{0}{*ch == 0} \end{tikzpicture} ------------------------------------------------------------------------------ + ---- CODE (type=c) ----------------------------------------------------------- if (*ch == 0) goto strlen_done; ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- /* if (*ch == 0) goto strlen_done; */ movq ch(%FP), %4 movzbq (%4), %4 subq 0, %4, %0 jz strlen_done ------------------------------------------------------------------------------ The next statement just increments the pointer to the next byte: + ---- TIKZ -------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \renewcommand\BoxWidth {5} \renewcommand\BoxHeight {1.2} \renewcommand\BoxDistance {1.8} %\SetMargin{5}{10}{0}{4} \PutStatement{0}{ch = ch + 1;} \end{tikzpicture} ------------------------------------------------------------------------------ + ---- CODE (type=c) ----------------------------------------------------------- ch = ch + 1; ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- /* ch = ch + 1; */ movq ch(%FP), %4 addq 1, %4, %4 movq %4, ch(%FP) ------------------------------------------------------------------------------ The `goto` statement is just a unconditional jump: + ---- TIKZ -------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \renewcommand\BoxWidth {5} \renewcommand\BoxHeight {1.2} \renewcommand\BoxDistance {1.8} \PutJump{0}{} \end{tikzpicture} ------------------------------------------------------------------------------ + ---- CODE (type=c) ----------------------------------------------------------- goto strlen_while; ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- /* goto strlen_while; */ jmp strlen_while ------------------------------------------------------------------------------ For leaving the loop we formally need the empty statement in C which becomes just a label in the assembly code: + ---- TIKZ -------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \renewcommand\BoxWidth {5} \renewcommand\BoxHeight {1.2} \renewcommand\BoxDistance {1.8} %\SetMargin{5}{10}{0}{4} \PutStatement{0}{/* empty statement*/ ;} \PutLabel{0}{strlen\_done} \end{tikzpicture} ------------------------------------------------------------------------------ + ---- CODE (type=c) ----------------------------------------------------------- strlen_done: ; ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- strlen_done: ------------------------------------------------------------------------------ Return statement ~~~~~~~~~~~~~~~~ For the return value we simply subtract the value of `ch` from the value of `str` and store it on the proper stack location. Then we jump to the epilogue: + ---- CODE (type=c) ----------------------------------------------------------- return ch - str; ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- /* return ch - str */ movq str(%FP), %4 movq ch(%FP), %5 subq %4, %5, %5 movq %5, rval(%FP) // define a label before the epilogue jmp .strlen.leave ------------------------------------------------------------------------------ Hence in total the implementation of function `strlen` is given by + ---- CODE (type=c) ----------------------------------------------------------- uint64_t strlen(char *str) { char *ch = str; while (*ch++) { } return ch - str; } ------------------------------------------------------------------------------ + ---- CODE (type=s) ----------------------------------------------------------- /* uint64_t strlen(char *str) { */ .equ str, func_arg0 .equ ch, local0 .text strlen: // function prologue movq %RET, ret(%SP) movq %FP, fp(%SP) addq 0, %SP, %FP subq 1*8, %SP, %SP // begin of the function body /* ch = str */ movq str(%FP), %4 movq %4, ch(%FP) strlen_while /* if (*ch == 0) goto .puts.done; */ movq ch(%FP), %4 movzbq (%4), %4 subq 0, %4, %0 jz strlen_done /* ++ch */ movq ch(%FP), %4 addq 1, %4, %4 movq %4, ch(%FP) /* goto strlen_while; */ jmp strlen_while strlen_done: /* return ch - str */ movq str(%FP), %4 movq ch(%FP), %5 subq %4, %5, %5 movq %5, rval(%FP) jmp .strlen.leave // end of the function body .strlen.leave // function epilogue addq 0, %FP, %SP movq fp(%SP), %FP movq ret(%SP), %RET jmp %RET, %0 /* } */ ------------------------------------------------------------------------------ Complete assembly implementation for this C code ------------------------------------------------ :import: session10/func/example_strlen.s [fold] ---- SHELL (path=session10/func) ----------------------------------------------- ulmas -o example_strlen example_strlen.s ulm example_strlen; echo $? -------------------------------------------------------------------------------- :links: system calls -> https://en.wikipedia.org/wiki/System_call spaghetti code -> https://en.wikipedia.org/wiki/Spaghetti_code