=======================================
Leaf Functions: Realization in Assembly					[TOC]
=======================================

What is shown here is a simple and efficient way for realizing a __function
call convention__.  However, there is a serious catch to it. It only works for
so called _leaf functions_, i.e. functions that do not call another functions.
So in particular something like recursive function calls will not work with
that.

:links: function call convention -> https://en.wikipedia.org/wiki/Calling_convention


---- VIDEO ------------------------------
https://www.youtube.com/embed/Jza4VpklYxA
-----------------------------------------

Provided Material
=================

This is the assembly program that was developed in the video:

---- BOX -----------------------------------------------------------------------
  The line with the halt instruction

  ---- CODE (type=s) -----------
  halt	%RET_VAL
  ------------------------------

  was changed to 

  ---- CODE (type=s) -----------
  halt	0
  ------------------------------

  So that it has an exit code of 0.
  
--------------------------------------------------------------------------------

:import: session12/func/test_func.s [fold]

And here the __ULM Instruction Set__ and its `isa.txt` source code:

:import: session12/func/0_ulm_variants/func/isa.txt [fold]

---- SHELL (path=session12/func/, hide) ----------------------------------------
make
make refman
mkdir -p /home/www/htdocs/numerik/hpc/ss22/hpc0/session12/func/
cp 1_ulm_build/func/refman.pdf /home/www/htdocs/numerik/hpc/ss22/hpc0/session12/func/
--------------------------------------------------------------------------------

:links: ULM Instruction Set -> https://www.mathematik.uni-ulm.de/numerik/hpc/ss22/hpc0/session12/func/refman.pdf


Function-Call Jump Instruction
==============================

Key for supporting functions calls on the ULM is the `jmp` instruction with
Opcode 0x14. It has two register operands to which we refer as `%FUNC_ADDR` and
`%RET_ADDR` respectively.  When executed it does two things:

- It stores the return address `%RET_ADDR` and then
- it jumps to the address in `%FUNC_ADDR`.

So what does _return address_ mean? It is the address of the instruction that
follows the jump instruction in memory. For example, when the jump instruction
is at address 0x20 then the return address is 0x24.

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}
\renewcommand\MemCellWidth { 1 }

\DrawMemArrayOpen{0}{15}

\DrawLongVariable{4}{ {\small ldzwq func, \%CALL} }
\DrawLongVariable[blue!20]{8}{ {\small jmp \%CALL, \%RET} }
\DrawMemAddress{8}{0x24}
\DrawMemAddress{12}{0x28}


\end{tikzpicture}
--------------------------------------------------------------------------------

In general the control flow can not be visualized with a flow chart. But in
simple cases we can use for examples colors to enhance the information in such
a flow chart. For example if in a program a leaf-function is called twice the
different colors used in the paths make clear to which return address the
function returns:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}

\input{flowchart.tex}

\SetMargin{5}{10}{0}{5}

\renewcommand\BoxWidth {5}


\PutStatement{0}{A}

\PutStatement{1}{ldzwq func, \%CALL}

\PutCallStatement[red]{2}{ jmp \%CALL, \%RET}
\PutStatement{3}{B}
\PutStatement{4}{ldzwq func, \%CALL}
\PutCallStatement[blue]{5}{jmp \%CALL, \%RET}
\PutStatement{6}{C}

\AddPath{0}{1}
\AddPath{1}{2}
\AddPath{3}{4}
\AddPath{4}{5}

\PutAnnotation{0}{\text{Entry point}}

% next flow chart column
\renewcommand\FlowCol{1}

\PutStatement{0}{x}
\PutLabel{0}{func}

\PutStatement{1}{jmp \%RET}

\AddPath{0}{1}

\DrawCallPointer[red]{0}{2}{1}{0}
\DrawReturnPointer[red]{0}{3}{1}{1}

\renewcommand{\CallPointerPadToY}{1}
\DrawCallPointer[blue]{0}{5}{1}{0}
\DrawReturnPointer[blue]{0}{6}{1}{1}

\end{tikzpicture}
--------------------------------------------------------------------------------

Here another example where two different leaf-functions are called:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{flowchart.tex}

\SetMargin{5}{10}{0}{5}

\renewcommand\BoxWidth {5}


\PutStatement{0}{A}

\PutStatement{1}{ldzwq funcA, \%CALL}

\PutCallStatement[red]{2}{ jmp \%CALL, \%RET}
\PutStatement{3}{B}
\PutStatement{4}{ldzwq funcB, \%CALL}
\PutCallStatement[blue]{5}{jmp \%CALL, \%RET}
\PutStatement{6}{C}

\AddPath{0}{1}
\AddPath{1}{2}
\AddPath{3}{4}
\AddPath{4}{5}

\PutAnnotation{0}{\text{Entry point}}

% next flow chart column
\renewcommand\FlowCol{1}

\PutStatement{0}{x}
\PutLabel{0}{funcA}

\PutStatement{1}{jmp \%RET}

\AddPath{0}{1}

% next flow chart column
\renewcommand\FlowCol{2}

\PutStatement{0}{y}
\PutLabel{0}{funcB}

\PutStatement{1}{jmp \%RET}

\AddPath{0}{1}

\DrawCallPointer[red]{0}{2}{1}{0}
\DrawReturnPointer[red]{0}{3}{1}{1}

\renewcommand{\CallPointerPadToY}{1}
\DrawCallPointer[blue]{0}{5}{2}{0}
\DrawReturnPointer[blue]{0}{6}{2}{1}
\end{tikzpicture}
--------------------------------------------------------------------------------


How the Function-Call Jump Instruction is Realized
--------------------------------------------------

Recall that the ULM implements this simple von Neumann cycle for executing a
program. So when an instruction gets executed it was loaded from address `%IP`
(instruction pointer) into the `%IR` (instruction register). So what the
instruction does when it gets executed is

---- LATEX ---------------------------------------------------------------------
\left( u(\%\text{ip}) + 4 \right) \bmod 2^{64} \to u(\%\text{RET})
\quad\text{and}
\quad u(\%\text{CALL}) \to u(\%\text{ip})
--------------------------------------------------------------------------------

Returning from a function is then possible with a `jmp %RET_ADDR, %0`
instruction. Of course this requires that `%RET_ADDR` by then still (or again)
contains the original return address. So when you program you should know that
you can not simply use register `%RET+ADDR` without care.  So you see that we
need some rules, a protocol or call it convention for function calls.

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}
\renewcommand\MemCellWidth { 1 }

\DrawMemArrayOpenRight{0}{20}
\DrawMemLabel{0}{Entry Point}

\DrawMemVariable[green!20]{0}{8}{ Main program (Caller) }
\DrawMemVariable[orange!20]{8}{13}{ Leaf function A (Callee) }
\DrawMemVariable[orange!20]{13}{18}{ Leaf function B (Callee) }
\end{tikzpicture}
--------------------------------------------------------------------------------

Descriptions of such conventions usually the terms _caller_ and _callee_.  As
long as we only deal with leaf functions these terms have a simple meaning.
The program text can be divided into a main program and functions. In the main
program we only have code that eventually calls a function, i.e. caller code.
In the functions we only have code that gets executed when a function was
called, i.e. callee code. Now think of two persons (caller and callee) writing
a program together.  The caller is writing the main program, the callee all the
functions. The function call conventions are a contract between the caller and
callee and allows to collaborate almost independently from each other. Actually
the contract specifies what information one has to provide to another.


Caller
------
The caller has to know the address of the function, how to pass arguments and
eventually how to get a result back from the function. The caller also has to
know what registers the function might modify.

In the simplest form the function does not expect any arguments and does not
return a result. In this case the caller just loads the address of the function
into `%CALL` and does the (function) jump:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}
\renewcommand\MemCellWidth { 1 }

\DrawMemArrayOpenRight{0}{20}

\begingroup
\renewcommand\PaddingMemVariable {0.05}
\DrawMemVariable[green!20]{0}{20}
\par\endgroup

\DrawLongVariable{4}{ {\small ldzwq func, \%CALL} }
\DrawLongVariable[blue!20]{8}{ {\small jmp \%CALL, \%RET} }
\DrawMemLabel{0}{Entry point}
\end{tikzpicture}
--------------------------------------------------------------------------------

Also shown in this picture is the _entry point_ of the
program. The entry point is the address of the instruction that gets executed
first when execution of the program starts. Of course a correct program should
also have at least one exit point, i.e. some halt instruction.  As a programmer
you have to make sure that at least on of these exit points gets reached.


Callee
------
Accordingly the callee needs to know how to receive arguments, how to return
and eventually how to give back a result. And of course, what registers can be
used and modified for computations.

Again we first consider that the function does not receive any arguments and
does not return any result. Then we basically have this (we talk about what
registers the function implementation is allowed to modify below):

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}
\renewcommand\MemCellWidth { 1 }

\DrawMemArrayOpen{0}{20}

\begingroup
\renewcommand\PaddingMemVariable {0.05}
\DrawMemVariable[orange!20]{0}{10}{}
\DrawMemVariable[orange!20]{0}{5}{Leaf function A }

\renewcommand\PaddingMemVariable {0.05}
\DrawMemVariable[orange!20]{10}{20}{}
\DrawMemVariable[orange!20]{10}{14}{Leaf function B }
\par\endgroup

\DrawLongVariable[blue!20]{6}{ {\small jmp \%RET, \%0} }
\DrawLongVariable[blue!20]{16}{ {\small jmp \%RET, \%0} }

\DrawMemLabel{0}{A}
\DrawMemLabel{10}{B}

\end{tikzpicture}
--------------------------------------------------------------------------------


How the ARM Architecture Supports Function Calls
================================================

The __ARM architecture__ provides the "branch and link" instruction `BL` for
calling a function:

---- CODE (type=s) -------------------------------------------------------------
    BL	function_label
--------------------------------------------------------------------------------

This instruction stores the return address in a register called `lr` (for _link
register_) and then jumps to the address referred to by the label
`function_label` (Sound familiar?).  Returning from the function is done on ARM
with a move-instruction. You simple overwrite the instruction pointer (so what
we call `IP` for instruction pointer on ARM it's called `PC` for program
counter) with the link register.

---- CODE (type=s) -------------------------------------------------------------
    MOV     pc, lr            /* Return from subroutine.
				 Note: The MOV copies the right-hand-side to the
				 left-hand-side
			      */
--------------------------------------------------------------------------------


Note that on the ULM you can note use `%IP` explicitly in an instruction, but
`jmp %RET, %0` is implicitly is equivalent to that.


Conventions for Caller and Callee
=================================
Registers either belong to the caller or the callee. However, we keep it simple
here only a subset of registers actually have a owner. If the subset turns out
as too small we still can use more registers.

Registers `%CALLER0`, ..., `%CALLER3` belong to the caller, and registers
`%CALLEE0`, ..., `%CALLEE3` to the callee. Using `.equ` directives we make sure
that these identifiers are mapped to distinct registers (and we also make it
easy to extend each set):

---- CODE (type=s) -------------------------------------------------------------
    .equ    CALLER0,    1
    .equ    CALLER1,    CALLER0 + 1
    .equ    CALLER2,    CALLER1 + 1
    .equ    CALLER3,    CALLER2 + 1
    .equ    CALLER_LAST,CALLER3

    .equ    CALLEE0,    CALLER_LAST + 1
    .equ    CALLEE1,    CALLEE0 + 1
    .equ    CALLEE2,    CALLEE1 + 1
    .equ    CALLEE3,    CALLEE2 + 1
    .equ    CALLEE_LAST,CALLEE3
--------------------------------------------------------------------------------

Return Address
--------------
The callee needs to know where the caller stored the return address. For that
we specify with

---- CODE (type=s) --------------------
    .equ    RET_ADDR,	    CALLER1
---------------------------------------

that it will be in `%RET_ADDR`.


Passing arguments
-----------------
If the callee expects some parameters then the caller needs to know _where_ the
callee expects these parameters. We can easily extend this list


---- CODE (type=s) --------------------
    .equ    PARAM0,	    CALLEE0
    .equ    PARAM1,	    PARAM0 + 1
---------------------------------------

to specify that a first parameter is always expected in `%PARAM0`, a second
always in `%PARAM1`, etc.


Returning results
-----------------

If a callee returns a value it needs to know where the caller wants to have the
result. With

---- CODE (type=s) --------------------
    .equ    RET_VAL,	    CALLER2
---------------------------------------

we specify that the callee should store its result in `%RET_VAL`.


:links: ARM architecture -> https://en.wikipedia.org/wiki/ARM_architecture