==================================================
Proof of concept: Using a stack for function calls			[TOC]
==================================================

We now investigate how a stack can be used _in principal_ for functions calls
with a (potentially) arbitrary deep call tree. That means a function can call a
function which calls a functions etc. The only limitation will be the size of
the memory.

For this proof of concept we define a calling convention that not really allows
to implement functions that can do anything useful. But this minimalistic
convention will be extended, so what you learn here will not just be some
theoretical nonsense.


Working example
===============
The flow chart below illustrates some of the basic ideas that will be realized
by the calling convention in this session.  It also outlines the general
structure that all programs will have from now on:

- Each program consists of functions, and you can have as many as you want (they
  just have to fit into memory),
- each program has at least one function `main`,
- when a program starts some code block initializes the stack, calls function
  `main` and then halts the ULM. 

Using the ULM C compiler later allows us to write the function is a more
abstract way. But the compiler just generates the assembly code that we have to
write here by ourself.

The code block for initializing the stack will be the only part that needs to be
hand coded in assembly. And it has to be ensured that the entry point of the
program is the first instruction of this code block.

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{flowchart.tex}

\SetMargin{5}{10}{0}{3}
\renewcommand\BoxWidth {5.5}
\PutStatement{0}{initialize stack}
\PutCallStatement[red!50]{1}{ call main }
\PutStatement{2}{halt}
\AddPath{0}{1}
\AddPath{1}{2}
\PutAnnotation{0}{\text{Entry point}}

% next flow chart column
\renewcommand\FlowCol{1}

\PutStatement{0}{do something (e.g putc 'A')}
\PutLabel[above=-0.1]{0}{main}
\PutCallStatement[orange!50]{1}{ call func }
\PutStatement{2}{do something (e.g. putc 'B')}
\PutStatement{3}{return}

\AddPath{0}{1}
\AddPath{1}{2}
\AddPath{2}{3}

% next flow chart column
\renewcommand\FlowCol{2}

\PutStatement{0}{do something (e.g putc 'C')}
\PutLabel[above=-0.1]{0}{func}
\PutStatement{1}{return}
\AddPath{0}{1}

\DrawCallPointer[red!50]{0}{1}{1}{0}
\DrawReturnPointer[red!50]{0}{1}{1}{3}

\renewcommand{\CallPointerPadToY}{0}
\DrawCallPointer[orange!50]{1}{1}{2}{0}
\DrawReturnPointer[orange!50]{1}{1}{2}{1}

\end{tikzpicture}
--------------------------------------------------------------------------------

Calling convention without the gory details
===========================================
The roles of caller and callee are no longer mutually exclusive. When a function
calls another function the role switches from callee to caller. And every
function switches back to the role of a callee before it returns.

So when we talk here about a caller we mean the situation right before a
function call. And when we describe the role of the callee we describe the
situation right before a function return. Therefore we just have to specify two
things here:

- _How to call a function:_ Push the return address onto the stack and jump to
  the function.
- _How to return:_ Pop the return address from the stack and jump to the return
  address.

And on the Intel architecture (which is a CISC architecture) this is directly
supported by instructions:

- `call foo` would push the return address on the stack and jump to address
  `foo`, and
- `ret` pops a return address and jumps to it.

These instructions make it simple to write and use functions, although you still
have to be sure that `call` and `ret` work perfectly together. Because `ret`
just pops whatever value is on top of the stack, that means whatever value was
pushed last.

On RISK architectures (like the ULM or ARM) such things have to be done "more
manually", that means we have to do what the CISC architectures is hiding with
such instructions.  So that is why we have to deal with some gory details.

The gory details that we will avoid
===================================
It would be insane (i.e inefficient) to imitate the `call` instruction from the
Intel architecture on the ULM. Neither the instruction set not the assembler
provides you much help for computing the return address _before_ you jump to a
function. The best thing for getting the return address _before_ the jump could
be achieved by placing a label after the jump instruction:

---- CODE (type=s) -------------------------------------------------------------
    // ....
    jmp	    %CALL,	%0		# jump to function
ret					# label for return address
--------------------------------------------------------------------------------

Then we could actually load the return address _before_ the jump:

---- CODE (type=s) -------------------------------------------------------------
    ldzwq   func,	%CALL		# load call address
    ldzwq   ret,	%VAL		# load return address
    subq    8,		%SP,	%SP	# push return address
    movq    %VAL,	(%SP)
    jmp	    %CALL,	%0		# jump to function
ret
--------------------------------------------------------------------------------

But why the hell is it so important to have the return address _before_ the
jump? It is not. The ULM provides the `jmp %CALL, %RET` instruction, so you have
the return address _after_ the jump _for free_. No need to waste instructions
for loading or computing the return address in advance.


The gory details that we happily will (learn to) live with
==========================================================
Let me first show you how we achieve what matters and then explain why it is
basically the same as Intel's `call` and `ret`:

Reserved registers
~~~~~~~~~~~~~~~~~~
Two registers are used for the calling convention here:

---- CODE (type=s) -------------------------------------------------------------
    .equ    SP,	    2
    .equ    RET,    3
--------------------------------------------------------------------------------

Register `%SP` for the stack and register `%RET` for the return address.

Mechanism for calling a function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The essential pattern for calling a function is this:

---- CODE (type=s) -------------------------------------------------------------
    subq    8,	    %SP,    %SP		    # provide space on stack for callee
    /*
	Load the address of the function in a register %CALL.
    */
    jmp	    %CALL,  %RET
    addq    8,	    %SP,    %SP		    # restore old stack state
--------------------------------------------------------------------------------

Obviously the stack pointer gets manipulated but nothing gets copied to memory.
You will see that we basically do _half of a push operation_. Also note that you
can use any register (except `%SP`, `%RET` and obviously `%0`) for storing the
address of the function address.


How to implement a function
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The essential pattern for writing a function is

---- CODE (type=s) -------------------------------------------------------------
function_name:
      movq	%RET,		(%SP)	    # function prologue: save %RET
      // begin of the function body

      /* Your implementation of the function */

      // end of the function body
      movq	(%SP),	%RET		    # function epilogue: restore %RET
      jmp	%RET,	%0
--------------------------------------------------------------------------------

Important here are the two lines with the comments __function prologue__ and
__function epilogue__. Every function has to begin with the prologue `movq %RET,
(%SP)` and has to end with the epilogue `movq  (%SP), %RET` followed by the jump
to the return address.

Applying the calling convention to the working example
------------------------------------------------------
Usually we will blend out the details of the calling conventions when we
describe a program. But for now, let's blend in the convention for the working
example from above. Just to get a feeling what needs to be done in a mechanical,
boring way that actually something like a compiler could do for us:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{flowchart.tex}

\SetMargin{5}{10}{0}{3}
\renewcommand\BoxWidth {5.5}
\PutStatement{0}{initialize stack}
\PutCallStatement[red!50]{1}{ mechanism for calling main}
\PutStatement{2}{halt}
\AddPath{0}{1}
\AddPath{1}{2}
\PutAnnotation{0}{\text{Entry point}}

% next flow chart column
\renewcommand\FlowCol{1}

\PutStatement{0}{\textit{function prologue}}
\PutLabel{0}{main}
\PutStatement{1}{do something (e.g putc 'A')}
\PutCallStatement[orange!50]{2}{ mechanism for calling func }
\PutStatement{3}{do something (e.g. putc 'B')}
\PutStatement{4}{\textit{function epilogue }}
\PutStatement{5}{return}

\AddPath{0}{1}
\AddPath{1}{2}
\AddPath{2}{3}
\AddPath{3}{4}
\AddPath{4}{5}

% next flow chart column
\renewcommand\FlowCol{2}

\PutStatement{0}{\textit{function prologue}}
\PutLabel[above=-0.1]{0}{func}
\PutStatement{1}{do something (e.g putc 'C')}
\PutStatement{2}{\textit{function epilogue }}
\PutStatement{3}{return}

\AddPath{0}{1}
\AddPath{1}{2}
\AddPath{2}{3}

\DrawCallPointer[red!50]{0}{1}{1}{0}
\DrawReturnPointer[red!50]{0}{1}{1}{5}

\renewcommand{\CallPointerPadToY}{0}
\DrawCallPointer[orange!50]{1}{2}{2}{0}
\DrawReturnPointer[orange!50]{1}{2}{2}{3}

\end{tikzpicture}
--------------------------------------------------------------------------------

How does our calling convention work?
-------------------------------------
We could describe the convention as follows:

- Pushing the return address is not an atomic operation. The first half of the
  operation is done before the jump and the second half by the function
  prologue.
- Popping the return address is not an atomic operation. The first half of the
  operation is done by the function epilogue and the second half by the caller
  after the function returned.

Before this "mechanism for calling a function" gets triggered the stack can be
described by:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpen{-40}{-1}

\DrawQuadVariable[red!40]{-8}{Used}
\DrawMemVariable[gray!10]{-40}{-8}{Unused but can contain "trash" }

\DrawPointer{-8}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

The concrete address of the stack does not matter. What matters is that at
higher addresses (memory cells at the right-hand side  of `%SP`) are used, i.e.
contain data that is saved on the stack.

Caller does "first half of pushing the return address"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The relevant part done by the caller for pushing the return address is done by

---- CODE (type=s) -------------------------------------------------------------
    subq    8,      %SP,    %SP
--------------------------------------------------------------------------------

This provides the callee space on the stack to save its return address. Now the
stack can be described by

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpen{-40}{-1}

\DrawQuadVariable[red!40]{-8}{Used}
\DrawMemVariable[cyan!40]{-16}{-8}{reserved for return address}
\DrawMemVariable[gray!10]{-40}{-16}{Unused but can contain "trash" }

\DrawPointer{-16}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

The next two instructions have nothing to do with the stack. The caller just
loads the functions address and jumps to the function:

---- CODE (type=s) -------------------------------------------------------------
    ldzwq   foo,    %CALL    # you can use any register for CALL except SP
    jmp	    %CALL,  %RET
--------------------------------------------------------------------------------

But after the jump the return address is stored in `%RET` and the callee takes
over.


Callee prologue does "second half of pushing the return address"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As the callee knows that the return address is in register `%RET`, it can
complete the push operation with

---- CODE (type=s) -------------------------------------------------------------
    movq    %RET,   (%SP)
--------------------------------------------------------------------------------

So here we go, the return address is at as safe place, at the right-hand side
of the stack pointer:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpen{-40}{-1}

\DrawQuadVariable[red!40]{-8}{Used}
\DrawMemVariable[red!40]{-16}{-8}{return address}
\DrawMemVariable[gray!10]{-40}{-16}{Unused but can contain "trash" }

\DrawPointer{-16}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

Now the callee can do something useful
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
After the prologue the callee can start to do it's job. Even calling a function!


Callee epilogue does "first half of popping the return address"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When the callee is done with its job the epilogue

---- CODE (type=s) -------------------------------------------------------------
    movq    (%SP),  %RET
--------------------------------------------------------------------------------

does the first half of the pop operation, i.e. copying the return address into
`%RET`.

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpen{-40}{-1}

\DrawQuadVariable[red!40]{-8}{Used}
\DrawMemVariable[red!40]{-16}{-8}{return address}
\DrawMemVariable[gray!10]{-40}{-16}{Unused but can contain "trash" }

\DrawPointer{-16}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

Then the callee can return with

---- CODE (type=s) -------------------------------------------------------------
    jmp	    %RET,   %0
--------------------------------------------------------------------------------

Note that neither the prologue nor epilogue did modify the stack pointer.  Both
were just storing or fetching data from the top of the stack.  

Caller does "seconf half of popping the return address"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
After the function returned the caller restores the stack with

---- CODE (type=s) -------------------------------------------------------------
    subq    8,      %SP,    %SP
--------------------------------------------------------------------------------

so we have the same stack as before (with trash on the left-hand-side of `%SP`):

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpen{-40}{-1}

\DrawQuadVariable[red!40]{-8}{Used}
\DrawMemVariable[gray!10]{-16}{-8}{"trash" }
\DrawMemVariable[gray!10]{-40}{-16}{Unused but can contain "trash" }

\DrawPointer{-8}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------


Example Code with the bloody details
====================================
This code implements the working example according to the above pattern:

:import: session10/subprog/subprog_without_fp.s [fold]

Translate and run it as usual:

---- SHELL (path=session10/subprog/) -------------------------------------------
ulmas -o subprog_without_fp  subprog_without_fp.s
ulm subprog_without_fp
--------------------------------------------------------------------------------

Notes about using symbols for registers
---------------------------------------
Note that symbols are used in the code only for those registers that are
relevant for the calling convention, i.e. for `%SP` and `%RET`. For other
registers plain literals are used, e.g. `%4`. Our new function call convention
will allow it that registers `%4`, ..., `%255` can be used freely in every
function. They no longer have a special meaning. So in one function you can use
`%4` for calling a function and in another `%5` etc.


Btw: What does "... without fp" mean?
=====================================
"fp" is short for _frame pointer_. So "subprogram without frame ppointer"
indicates that some extension is coming up ;-)


Quiz08
======
For this quiz you have to hand in two files: `call_stack.s` and
`call_stack.txt`.

On `theon` submit your program with the command

`submit hpc quiz08 call_stack.s call_stack.txt`

Your answers will not be evaluated automatically but you should get a
confirmation email after your submit.

About `call_stack.s`
~~~~~~~~~~~~~~~~~~~~
Write a program `call_stack.s` with functions `main`, `funcA`, `funcB` and
`funcC`. These functions are supposed to do the following:

- Function `main` prints the character `M`, calls function `funcA`, prints `m`,
  prints a newline and then returns.
- Function `funcA` prints the character `A`, calls function `funcB`, prints `,`,
  calls function `funcC`, prints `a` and then returns.
- Function `funcB` prints the character `B` and then returns.
- Function `funcC` prints the character `C`, calls function `funcB`, prints `c`,
  calls function `funcB` and then returns.

--- CODE (file=session10/call_stack.s) ----------------------------------------
	#NOTE: relevant for the function call convention
	.equ	SP,	2
	.equ	RET,	3

	.text
/*
	Entry point
*/
_start:
	#NOTE: relevant for the convention
	ldzwq	0,		%SP

	subq	8,	%SP,	%SP
	ldzwq	main,		%4
	jmp	%4,		%RET
	addq	8,	%SP,	%SP

/*
	(only) Exit point
*/
	halt	0

//------------------------------------------------------------------------------
// PROGRAM main
//------------------------------------------------------------------------------

--------------------------------------------------------------------------------


About `call_stack.txt`
~~~~~~~~~~~~~~~~~~~~~~
It is important that your program "blindly follows the rules".  You are supposed
to work here like a compiler! But if you have an idea how the implementation of
some of these functions could be simplified then write down your thought in
`call_stack.txt`. Maybe there is even a simple rule in which case your idea
always works.


:links: function prologue -> https://en.wikipedia.org/wiki/Function_prologue
	function epilogue -> https://en.wikipedia.org/wiki/Function_prologue