================================================== Proof of concept: Using a stack for function calls [TOC] ================================================== We now investigate how a stack can be used _in principal_ for functions calls with a (potentially) arbitrary deep call tree. That means a function can call a function which calls a functions etc. The only limitation will be the size of the memory. For this proof of concept we define a calling convention that not really allows to implement functions that can do anything useful. But this minimalistic convention will be extended, so what you learn here will not just be some theoretical nonsense. Working example =============== The flow chart below illustrates some of the basic ideas that will be realized by the calling convention in this session. It also outlines the general structure that all programs will have from now on: - Each program consists of functions, and you can have as many as you want (they just have to fit into memory), - each program has at least one function `main`, - when a program starts some code block initializes the stack, calls function `main` and then halts the ULM. Using the ULM C compiler later allows us to write the function is a more abstract way. But the compiler just generates the assembly code that we have to write here by ourself. The code block for initializing the stack will be the only part that needs to be hand coded in assembly. And it has to be ensured that the entry point of the program is the first instruction of this code block. ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \SetMargin{5}{10}{0}{3} \renewcommand\BoxWidth {5.5} \PutStatement{0}{initialize stack} \PutCallStatement[red!50]{1}{ call main } \PutStatement{2}{halt} \AddPath{0}{1} \AddPath{1}{2} \PutAnnotation{0}{\text{Entry point}} % next flow chart column \renewcommand\FlowCol{1} \PutStatement{0}{do something (e.g putc 'A')} \PutLabel[above=-0.1]{0}{main} \PutCallStatement[orange!50]{1}{ call func } \PutStatement{2}{do something (e.g. putc 'B')} \PutStatement{3}{return} \AddPath{0}{1} \AddPath{1}{2} \AddPath{2}{3} % next flow chart column \renewcommand\FlowCol{2} \PutStatement{0}{do something (e.g putc 'C')} \PutLabel[above=-0.1]{0}{func} \PutStatement{1}{return} \AddPath{0}{1} \DrawCallPointer[red!50]{0}{1}{1}{0} \DrawReturnPointer[red!50]{0}{1}{1}{3} \renewcommand{\CallPointerPadToY}{0} \DrawCallPointer[orange!50]{1}{1}{2}{0} \DrawReturnPointer[orange!50]{1}{1}{2}{1} \end{tikzpicture} -------------------------------------------------------------------------------- Calling convention without the gory details =========================================== The roles of caller and callee are no longer mutually exclusive. When a function calls another function the role switches from callee to caller. And every function switches back to the role of a callee before it returns. So when we talk here about a caller we mean the situation right before a function call. And when we describe the role of the callee we describe the situation right before a function return. Therefore we just have to specify two things here: - _How to call a function:_ Push the return address onto the stack and jump to the function. - _How to return:_ Pop the return address from the stack and jump to the return address. And on the Intel architecture (which is a CISC architecture) this is directly supported by instructions: - `call foo` would push the return address on the stack and jump to address `foo`, and - `ret` pops a return address and jumps to it. These instructions make it simple to write and use functions, although you still have to be sure that `call` and `ret` work perfectly together. Because `ret` just pops whatever value is on top of the stack, that means whatever value was pushed last. On RISK architectures (like the ULM or ARM) such things have to be done "more manually", that means we have to do what the CISC architectures is hiding with such instructions. So that is why we have to deal with some gory details. The gory details that we will avoid =================================== It would be insane (i.e inefficient) to imitate the `call` instruction from the Intel architecture on the ULM. Neither the instruction set not the assembler provides you much help for computing the return address _before_ you jump to a function. The best thing for getting the return address _before_ the jump could be achieved by placing a label after the jump instruction: ---- CODE (type=s) ------------------------------------------------------------- // .... jmp %CALL, %0 # jump to function ret # label for return address -------------------------------------------------------------------------------- Then we could actually load the return address _before_ the jump: ---- CODE (type=s) ------------------------------------------------------------- ldzwq func, %CALL # load call address ldzwq ret, %VAL # load return address subq 8, %SP, %SP # push return address movq %VAL, (%SP) jmp %CALL, %0 # jump to function ret -------------------------------------------------------------------------------- But why the hell is it so important to have the return address _before_ the jump? It is not. The ULM provides the `jmp %CALL, %RET` instruction, so you have the return address _after_ the jump _for free_. No need to waste instructions for loading or computing the return address in advance. The gory details that we happily will (learn to) live with ========================================================== Let me first show you how we achieve what matters and then explain why it is basically the same as Intel's `call` and `ret`: Reserved registers ~~~~~~~~~~~~~~~~~~ Two registers are used for the calling convention here: ---- CODE (type=s) ------------------------------------------------------------- .equ SP, 2 .equ RET, 3 -------------------------------------------------------------------------------- Register `%SP` for the stack and register `%RET` for the return address. Mechanism for calling a function ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The essential pattern for calling a function is this: ---- CODE (type=s) ------------------------------------------------------------- subq 8, %SP, %SP # provide space on stack for callee /* Load the address of the function in a register %CALL. */ jmp %CALL, %RET addq 8, %SP, %SP # restore old stack state -------------------------------------------------------------------------------- Obviously the stack pointer gets manipulated but nothing gets copied to memory. You will see that we basically do _half of a push operation_. Also note that you can use any register (except `%SP`, `%RET` and obviously `%0`) for storing the address of the function address. How to implement a function ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The essential pattern for writing a function is ---- CODE (type=s) ------------------------------------------------------------- function_name: movq %RET, (%SP) # function prologue: save %RET // begin of the function body /* Your implementation of the function */ // end of the function body movq (%SP), %RET # function epilogue: restore %RET jmp %RET, %0 -------------------------------------------------------------------------------- Important here are the two lines with the comments __function prologue__ and __function epilogue__. Every function has to begin with the prologue `movq %RET, (%SP)` and has to end with the epilogue `movq (%SP), %RET` followed by the jump to the return address. Applying the calling convention to the working example ------------------------------------------------------ Usually we will blend out the details of the calling conventions when we describe a program. But for now, let's blend in the convention for the working example from above. Just to get a feeling what needs to be done in a mechanical, boring way that actually something like a compiler could do for us: ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{flowchart.tex} \SetMargin{5}{10}{0}{3} \renewcommand\BoxWidth {5.5} \PutStatement{0}{initialize stack} \PutCallStatement[red!50]{1}{ mechanism for calling main} \PutStatement{2}{halt} \AddPath{0}{1} \AddPath{1}{2} \PutAnnotation{0}{\text{Entry point}} % next flow chart column \renewcommand\FlowCol{1} \PutStatement{0}{\textit{function prologue}} \PutLabel{0}{main} \PutStatement{1}{do something (e.g putc 'A')} \PutCallStatement[orange!50]{2}{ mechanism for calling func } \PutStatement{3}{do something (e.g. putc 'B')} \PutStatement{4}{\textit{function epilogue }} \PutStatement{5}{return} \AddPath{0}{1} \AddPath{1}{2} \AddPath{2}{3} \AddPath{3}{4} \AddPath{4}{5} % next flow chart column \renewcommand\FlowCol{2} \PutStatement{0}{\textit{function prologue}} \PutLabel[above=-0.1]{0}{func} \PutStatement{1}{do something (e.g putc 'C')} \PutStatement{2}{\textit{function epilogue }} \PutStatement{3}{return} \AddPath{0}{1} \AddPath{1}{2} \AddPath{2}{3} \DrawCallPointer[red!50]{0}{1}{1}{0} \DrawReturnPointer[red!50]{0}{1}{1}{5} \renewcommand{\CallPointerPadToY}{0} \DrawCallPointer[orange!50]{1}{2}{2}{0} \DrawReturnPointer[orange!50]{1}{2}{2}{3} \end{tikzpicture} -------------------------------------------------------------------------------- How does our calling convention work? ------------------------------------- We could describe the convention as follows: - Pushing the return address is not an atomic operation. The first half of the operation is done before the jump and the second half by the function prologue. - Popping the return address is not an atomic operation. The first half of the operation is done by the function epilogue and the second half by the caller after the function returned. Before this "mechanism for calling a function" gets triggered the stack can be described by: ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{memory.tex} \renewcommand\MemCellWidth { 0.6} \DrawMemArrayOpen{-40}{-1} \DrawQuadVariable[red!40]{-8}{Used} \DrawMemVariable[gray!10]{-40}{-8}{Unused but can contain "trash" } \DrawPointer{-8}{\%SP} \end{tikzpicture} -------------------------------------------------------------------------------- The concrete address of the stack does not matter. What matters is that at higher addresses (memory cells at the right-hand side of `%SP`) are used, i.e. contain data that is saved on the stack. Caller does "first half of pushing the return address" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The relevant part done by the caller for pushing the return address is done by ---- CODE (type=s) ------------------------------------------------------------- subq 8, %SP, %SP -------------------------------------------------------------------------------- This provides the callee space on the stack to save its return address. Now the stack can be described by ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{memory.tex} \renewcommand\MemCellWidth { 0.6} \DrawMemArrayOpen{-40}{-1} \DrawQuadVariable[red!40]{-8}{Used} \DrawMemVariable[cyan!40]{-16}{-8}{reserved for return address} \DrawMemVariable[gray!10]{-40}{-16}{Unused but can contain "trash" } \DrawPointer{-16}{\%SP} \end{tikzpicture} -------------------------------------------------------------------------------- The next two instructions have nothing to do with the stack. The caller just loads the functions address and jumps to the function: ---- CODE (type=s) ------------------------------------------------------------- ldzwq foo, %CALL # you can use any register for CALL except SP jmp %CALL, %RET -------------------------------------------------------------------------------- But after the jump the return address is stored in `%RET` and the callee takes over. Callee prologue does "second half of pushing the return address" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As the callee knows that the return address is in register `%RET`, it can complete the push operation with ---- CODE (type=s) ------------------------------------------------------------- movq %RET, (%SP) -------------------------------------------------------------------------------- So here we go, the return address is at as safe place, at the right-hand side of the stack pointer: ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{memory.tex} \renewcommand\MemCellWidth { 0.6} \DrawMemArrayOpen{-40}{-1} \DrawQuadVariable[red!40]{-8}{Used} \DrawMemVariable[red!40]{-16}{-8}{return address} \DrawMemVariable[gray!10]{-40}{-16}{Unused but can contain "trash" } \DrawPointer{-16}{\%SP} \end{tikzpicture} -------------------------------------------------------------------------------- Now the callee can do something useful ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After the prologue the callee can start to do it's job. Even calling a function! Callee epilogue does "first half of popping the return address" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When the callee is done with its job the epilogue ---- CODE (type=s) ------------------------------------------------------------- movq (%SP), %RET -------------------------------------------------------------------------------- does the first half of the pop operation, i.e. copying the return address into `%RET`. ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{memory.tex} \renewcommand\MemCellWidth { 0.6} \DrawMemArrayOpen{-40}{-1} \DrawQuadVariable[red!40]{-8}{Used} \DrawMemVariable[red!40]{-16}{-8}{return address} \DrawMemVariable[gray!10]{-40}{-16}{Unused but can contain "trash" } \DrawPointer{-16}{\%SP} \end{tikzpicture} -------------------------------------------------------------------------------- Then the callee can return with ---- CODE (type=s) ------------------------------------------------------------- jmp %RET, %0 -------------------------------------------------------------------------------- Note that neither the prologue nor epilogue did modify the stack pointer. Both were just storing or fetching data from the top of the stack. Caller does "seconf half of popping the return address" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After the function returned the caller restores the stack with ---- CODE (type=s) ------------------------------------------------------------- subq 8, %SP, %SP -------------------------------------------------------------------------------- so we have the same stack as before (with trash on the left-hand-side of `%SP`): ---- TIKZ ---------------------------------------------------------------------- \begin{tikzpicture} \input{memory.tex} \renewcommand\MemCellWidth { 0.6} \DrawMemArrayOpen{-40}{-1} \DrawQuadVariable[red!40]{-8}{Used} \DrawMemVariable[gray!10]{-16}{-8}{"trash" } \DrawMemVariable[gray!10]{-40}{-16}{Unused but can contain "trash" } \DrawPointer{-8}{\%SP} \end{tikzpicture} -------------------------------------------------------------------------------- Example Code with the bloody details ==================================== This code implements the working example according to the above pattern: :import: session10/subprog/subprog_without_fp.s [fold] Translate and run it as usual: ---- SHELL (path=session10/subprog/) ------------------------------------------- ulmas -o subprog_without_fp subprog_without_fp.s ulm subprog_without_fp -------------------------------------------------------------------------------- Notes about using symbols for registers --------------------------------------- Note that symbols are used in the code only for those registers that are relevant for the calling convention, i.e. for `%SP` and `%RET`. For other registers plain literals are used, e.g. `%4`. Our new function call convention will allow it that registers `%4`, ..., `%255` can be used freely in every function. They no longer have a special meaning. So in one function you can use `%4` for calling a function and in another `%5` etc. Btw: What does "... without fp" mean? ===================================== "fp" is short for _frame pointer_. So "subprogram without frame ppointer" indicates that some extension is coming up ;-) Quiz08 ====== For this quiz you have to hand in two files: `call_stack.s` and `call_stack.txt`. On `theon` submit your program with the command `submit hpc quiz08 call_stack.s call_stack.txt` Your answers will not be evaluated automatically but you should get a confirmation email after your submit. About `call_stack.s` ~~~~~~~~~~~~~~~~~~~~ Write a program `call_stack.s` with functions `main`, `funcA`, `funcB` and `funcC`. These functions are supposed to do the following: - Function `main` prints the character `M`, calls function `funcA`, prints `m`, prints a newline and then returns. - Function `funcA` prints the character `A`, calls function `funcB`, prints `,`, calls function `funcC`, prints `a` and then returns. - Function `funcB` prints the character `B` and then returns. - Function `funcC` prints the character `C`, calls function `funcB`, prints `c`, calls function `funcB` and then returns. --- CODE (file=session10/call_stack.s) ---------------------------------------- #NOTE: relevant for the function call convention .equ SP, 2 .equ RET, 3 .text /* Entry point */ _start: #NOTE: relevant for the convention ldzwq 0, %SP subq 8, %SP, %SP ldzwq main, %4 jmp %4, %RET addq 8, %SP, %SP /* (only) Exit point */ halt 0 //------------------------------------------------------------------------------ // PROGRAM main //------------------------------------------------------------------------------ -------------------------------------------------------------------------------- About `call_stack.txt` ~~~~~~~~~~~~~~~~~~~~~~ It is important that your program "blindly follows the rules". You are supposed to work here like a compiler! But if you have an idea how the implementation of some of these functions could be simplified then write down your thought in `call_stack.txt`. Maybe there is even a simple rule in which case your idea always works. :links: function prologue -> https://en.wikipedia.org/wiki/Function_prologue function epilogue -> https://en.wikipedia.org/wiki/Function_prologue