=====================================================
Calling convention, frame pointer and local variables			[TOC]
=====================================================

The calling convention for subprograms (so this will include procedures and
functions) will allow that a subprograms can freely use registers `%4`, ...,
`%255`.  So no more "perfect roomer", when you call a function you have to
expect that these registers where modified.

Hence, if values stored in registers are still needed after a function call you
have to _save_ them before the call and _restore_ them after the call, and you
have to use the memory for that. Except for storing the return address on the
stack, so far memory was only used for global variables (stored in either the
data segment or the BSS segment). However, it would not be feasible to use
global variables for saving registers. Each global variable requires a unique
label, so as soon as the number of subprograms grows you would end up in name
conflicts and unmanageable pieces of software. The rule of thumb is to use
global variable only when you have a good reason for it. We will use for example
global variables to communicate with subprograms until we have functions that
can receive arguments and can return a value.

This problem can be avoided by using the stack to store variables, these
variables are then denoted as _local variables_. When a function needs local
variables it reserves sufficient space on the stack by decrementing the stack
pointer and releases the memory before the return. This gets done in the
prologue and epilogue of the function. The advantage of this is that the memory
region used for local variables is bound to the life span of a function call.
After a function has done it's job (i.e. the function returned) the memory can
be reused.

For giving you an idea how the concepts of global and local variables are
expressed (and the technical details hidden) in C, the following code fragment
has a global variable `global` and a subprogram `foo` with a local variable
`local`:

---- CODE (type=c) -------------------------------------------------------------
int64_t	global;

void
foo(void)
{
    int64_t local;

    // implementation of foo
}
--------------------------------------------------------------------------------

Using C code as pseudo code allows to show how subprograms can be used for doing
actually something useful. For example, the following subprogram `factorial` can
be used to compute the factorial of an unsigned integer recursively:

---- CODE (type=c) -------------------------------------------------------------
int64_t	arg;

void
factorial(void)
{
    int64_t n;

    n = arg;
    if (n==0) {
	arg = 1;
    } else {
	arg = arg - 1;
	factorial();
	arg = n* arg;
    }
}
--------------------------------------------------------------------------------

In this code a global variable `arg` is used to pass an argument to the
subprogram and to receive the result from the subprogram.


Basic idea (without the gory details) for local variables on the stack
======================================================================
Again we first leave out some of the gory details. Let's assume that a caller
already did push the return address on the stack. So when the function gets
called the stack looks like that:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.48}

\DrawMemArrayOpen{-48}{-1}

\DrawMemVariable[red!40]{-24}{0}{Used}
\DrawMemVariable[white]{-48}{-24}{Not used}

\DrawPointer{-24}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

Further assume that the function has two local variables `a` and `b` (both with
a size of a quad word). Then the function decrements in the prologue the stack
pointer by 16, for using 16 bytes at the top of the stack:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.48}

\DrawMemArrayOpen{-48}{-1}

\DrawMemVariable[red!40]{-24}{0}{Used}
\DrawMemVariable[gray!40]{-32}{-24}{Locale variable a}
\DrawMemVariable[gray!40]{-40}{-32}{Locale variable b}
\DrawMemVariable[white]{-48}{-40}{Not used}

%\DrawPointer{-24}{\%FP}
\DrawPointer{-40}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

When this function calls another function these local variables are protected,
because they are on the stack. In the epilogue of the function the stack pointer
gets incremented by 16, and hence the memory for these variables can be reused
afterwards. So after the return the stack looks like that:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.48}

\DrawMemArrayOpen{-48}{-1}

\DrawMemVariable[red!40]{-24}{0}{Used}
\DrawMemVariable[gray!40]{-32}{-24}{Not used}
\DrawMemVariable[gray!40]{-40}{-32}{Not used}
\DrawMemVariable[white]{-48}{-40}{Not used}

%\DrawPointer{-24}{\%FP}
\DrawPointer{-24}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

Again it is worth mentioning that removing elements from our stack does not
"cleanup" memory in the sense of zeroing out bytes. We just move the stack
pointer that indicates the memory region is not free to use.

Why we want a frame pointer
===========================
Reserving space for local variables is done in the functions prologue, and
releasing it in the epilogue. And both have to match, i.e. when you reserve 16
bytes you have to release 16 bytes. It is possible to program that correctly,
but assume you change the implementation of the function because you want
another local variable. In such a case you have to change both, the prologue and
epilogue, and the need to change two things that have to match can be an
annoying source for careless errors.

Ideally you can always use the same prologue and always the same epilogue for
functions. Then you can use both by copy and paste (or hide them behind some
macro that gets expanded by some preprocessor that gets called before the
assembler sees the code).  The next best thing is that only the prologue needs
to be adopted for each function but the epilogue is always the same. And this
can be achieve by using a _frame pointer_. For that in the calling convention
another register `%FP` will be reserved.  When a function gets called the
original stack pointer becomes the frame pointer, and then the stack pointer
gets decremented if local variables are needed. So during the function call the
memory region used for local variables is "framed" by the stack pointer `%SP`
and frame pointer `%FP`: 

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.48}

\DrawMemArrayOpen{-48}{-1}

\DrawMemVariable[red!40]{-24}{0}{Used}
\DrawMemVariable[gray!40]{-32}{-24}{local variable a}
\DrawMemVariable[gray!40]{-40}{-32}{local variable b}
\DrawMemVariable[white]{-48}{-40}{Not used}

\DrawPointer{-24}{\%FP}
\DrawPointer{-40}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

The prologue and epilogue get adapted such that after a function returns the
stack and frame pointer are as before, so for the caller it seems that nothing
changed in that respect. This is is achieved as follows:

- The prologue consists of 3 instructions if no local variables are needed, and
  otherwise 4 instructions:

  - save the return address `%RET` on the stack (as before),
  - save the original frame pointer `%FP` on the stack,
  - the frame pointer `%FP` saves the original stack pointer and
  - if local variables are needed decrement stack pointer `%SP`.

- The epilogue before the return instruction always consists of 2 instructions
  (independent of how many local variables are used):

  - restore the original stack pointer and
  - restore the original frame pointer.

Now the gory details are about where on the stack the stack pointer and frame
pointer are stored. Our protocol will specify that the caller has to reserve
space on the stack so that the callee can save two registers, the return
register and the frame pointer register.

Calling convention: The gory details
====================================
Again I will first write down the details of the calling convention and the show
for an example how things work out.

Reserved registers
------------------

Three registers are used for the calling convention, and there won't be any
further changes to support procedures and functions:

---- CODE (type=s) -------------------------------------------------------------
    .equ    FP,	    1
    .equ    SP,	    2
    .equ    RET,    3
--------------------------------------------------------------------------------

Register `%FP` for the frame pointer, register `%SP` for the stack pointer, and
register `%RET` for the return address.

Calling a function
------------------

The essential pattern for calling a subprogram is this:

---- CODE (type=s) -------------------------------------------------------------
    subq    16,	    %SP,    %SP		    # provide space on stack for callee
    /*
	Load the address of the function in a register %CALL.
    */
    jmp	    %CALL,  %RET
    addq    16,	    %SP,    %SP		    # restore old stack state
--------------------------------------------------------------------------------

This means the caller reserves 16 bytes for the callee on the stack for storing
the return address (as before) and the frame pointer (that is new). The format
of the provided space is

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.48}

\DrawMemArrayOpen{-48}{-1}

\DrawMemVariable[red!40]{-8}{0}{Used}
\DrawQuadVariable[cyan!40]{-24}{reserved for callee}
\DrawQuadVariable[cyan!40]{-16}{reserved for callee}
\DrawMemVariable[gray!20]{-48}{-24}{Can be used for locale variables}

\DrawPointer{-24}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------


For the more general case of procedures and functions this will be adapted (the
stack will also be used to pass arguments and for receiving results).

Implementing a function: Prologue and Epilogue
----------------------------------------------
Every function has the following structure 

---- CODE (type=s) -------------------------------------------------------------
function_name:
    /*
	Function prologue
    */

    /*
	Implementation of the function
    */

    /*
	Functions epilogue
    */
    jmp	%RET,	%0
--------------------------------------------------------------------------------

This will be general enough for also supporting procedures and functions.

The 16 bytes reserved at the top of the stack pointer. i.e. at address `%SP` are
used in the _prologue_ and _epilogue_ for saving the return address and the
frame pointer. The format for these 16 bytes can be described by

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.48}

\DrawMemArrayOpen{-48}{-1}

\DrawMemVariable[red!40]{-8}{0}{Used}
\DrawQuadVariable[cyan!40]{-24}{reserved for \%RET}
\DrawQuadVariable[cyan!40]{-16}{reserved for \%FP}
\DrawMemVariable[gray!20]{-40}{-24}{Will be used for locale variables}

\DrawPointer{-24}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------


Prologue
~~~~~~~~
As described above, when a function gets called the return address and original
frame pointer gets saved first. Then the frame pointer marks the original top of
the stack and eventually the stack pointer gets decremented for local variables:

---- CODE(type=s) --------------------------------------------------------------
        movq %RET,  (%SP)	    // save the return address 
        movq %FP,   8(%SP)	    // save original frame pointer
        addq 0,	    %SP,    %FP	    // frame pointer is original stack pointer 
	/*
	    One more instruction here if local variables are needed:
	    - decrement %SP for local variables
	    - note that %SP needs to be aligned to 8 bytes!
	    So decrement %SP by the needed size rounded up to the next multiple
	    of 8.
	*/
--------------------------------------------------------------------------------

At the moment you can ignore the details about the alignment requirement of the
stack pointer.  In all examples the stack pointer will be decremented by a
multiple of 8. As the empty stack is zero initialized the stack pointer
therefore will always be aligned to 8 bytes.

After the prologue (and for the actual function implementation) the stack can be
described by

--- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.48}

\DrawMemArrayOpen{-48}{-1}

\DrawMemVariable[red!40]{-8}{0}{Used}
\DrawQuadVariable[cyan!40]{-24}{saved \%RET}
\DrawQuadVariable[cyan!40]{-16}{saved \%FP}
\DrawMemVariable[gray!40]{-40}{-24}{Used for locale variables}

\DrawPointer{-40}{\%SP}
\DrawPointer{-24}{\%FP}

\end{tikzpicture}
--------------------------------------------------------------------------------

This comment "note that `%SP` needs to ..." and the following is for future
references: In general the amount of bytes needed for local data needs to be
rounded up to the next multiple of 8, and by this amount the stack pointer needs
to be decremented. Otherwise a called function can not use the stack for data
that needs alignment.


Epilogue
~~~~~~~~
The epilogue guarantees that after the return the caller has the same stack as
before, i.e. it restores the original stack and frame pointer. The last
instruction loads the return address into the return register.

---- CODE(type=s) --------------------------------------------------------------
        addq 0,	    %FP,    %SP	    // restore original stack pointer
        movq 8(%SP),%FP		    // restore original frame pointer
        movq 0(%SP),%RET
--------------------------------------------------------------------------------


Some working example
====================

- As before the code block with label `_start` gets called first when the
  program gets started. Here the stack get initialized and then the `main`
  subprogram gets called. After `main` returns the program is halted.

- Subprogram `main` saves some register in a local variable, calls function
  `funcA`. After the function returned the register gets restored.

- Subprogram `funcA` just modifies some register and returns.


---- SHELL (path=session10/subprog) --------------------------------------------
ulmas -o subprog_with_fp subprog_with_fp.s
ulm subprog_with_fp
--------------------------------------------------------------------------------

:import: session10/subprog/subprog_with_fp.s