====================
Stack data structure                     				[TOC]
====================

For function calls we will use a _stack_ for storing return addresses, arguments
and  local variables.  In general a __stack__ is an abstract data type that
provides two operations

- _push_ for adding an element, and
- _pop_ for removing the most recently added element.

What we need is a stack data structure, i.e. an implementation of this abstract
concept that can be used later for function calls.  On the Intel architecture
instructions like `popq` and `pushq` are provided for this operations, and
registers like `%rsp` and `%rbp` are dedicated for the stack data structure.
Per se the ULM architecture does not provide such dedicated instructions for
stack operations and neither dedicated registers.

So it is up us, the programmers, to implement these operations, agree upon which
register are dedicated to the stack implementation, and we even have to decide
(or freely choose) which memory region gets used for the stack.

---- VIDEO ------------------------------
https://www.youtube.com/embed/S2qMnDQpTwk
-----------------------------------------

Error handling
==============
Error handling for the stack is a luxury that we do not provide. Our stack
implementation has to be efficient and therefore it has to be as simple as
possible. If something goes wrong it is either the fault of the programmer or we
ran into a scenario that we considered for efficiency concern as "sufficiently
unlikely" (and can not be handled if they happen after all).

Programmers fault: Popping data from an empty stack
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Things will go south if more data gets removed from the stack than was added.
This easily can be avoided by making sure that push and pop operations only
come in pairs.  That's a simple rule, hence not our fault if a programmer does
not follow it.

Programmers fault: Bugs in using the push and pop operation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As mentioned above, unlike the Intel instruction set the ULM does not provide
builtin push and pop instruction. But there will be a recipe that in a "don't
think just copy-and-paste these lines for push and these for pop" style.  That's
a simple rule, hence not our fault if a programmer does not follow it.


The "sufficiently unlikely" case
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When we push something onto the stack we have to find memory space that can be
overwritten. We should not overwrite memory that is used by the loaded program
(i.e. the text, data and BSS segment are taboo). We will use a simple that "most
likely always" finds some free memory for a push and releases it after a pop.
In the unlikely event of running out of memory we will have a _stack overflow_.
And in that case our program will not crash, and that is a problem, it will
continue run on with undefined behavior.

Fetch and store instructions for the ULM
========================================
The ULM instruction set provides _fetch instructions_ for copying data from the
memory into a register, and _store instruction_ to copy data from a register to
memory. What we use here is `movq (%Y), %Z` with __Opcode 0x18__ for fetching,
and `movq %X, (%Z)` with __Opcode 0x28__ for storing.  As you see in the
description of these instructions there is a so called alignment restriction,
the address has to be a multiple of eight. In this stack example this will
always be the case. So you can ignore it for now, I will dedicate an extra page
in this session to this topic.


Conventions for the stack
=========================
By convention register 2 is dedicated to the stack data structure and for
convenience we assume an directive like 

---- CODE (type=s) -----------------------------------------------------------
          .equ    SP,     2
------------------------------------------------------------------------------

is used, but it is not required. The reason for choosing register 2 is
arbitrary, as we have to choose something, and therefore does not require any
justification. But I can give some explanation: choosing register 0 would be
unwise and register 1 will have some special meaning in the next stack
convention which is also used by the ULM C compiler.

Initializing the stack
----------------------
The stack gets initialized by setting `%SP` to zero.  Because the memory has
$2^{64}$ bytes and registers $64$ bits, zero is the address that points
one-past-the-last memory cell:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpenLeft{-40}{-1}
\DrawHack{0}{$0$}
\DrawMemAddressAbove{0}{$2^{64}$}
\DrawMemAddressAbove{-8}{$2^{64}-8$}
\DrawMemAddressAbove{-16}{$2^{64}-16$}
\DrawMemAddressAbove{-24}{$2^{64}-24$}
\DrawMemAddressAbove{-32}{$2^{64}-32$}

\DrawPointer{0}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

Meaning of the stack pointer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If `%SP` is zero it means the stack is empty. That's because by convention
only memory cells with an address greater or equal to `%SP` contain data that is
stored on the stack. That means "memory on the right-hand-side of `%SP`" is used
by the stack:

---- TIKZ --------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}
\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpen{-40}{-1}

\DrawHack{0}{$0$}
\DrawQuadVariable[red!40]{-8}{Used by stack}
\DrawMemVariable[gray!10]{-40}{-8}{Free to use}

\DrawPointer{-8}{\%SP}

\end{tikzpicture}
------------------------------------------------------------------------------

If the stack pointer is not initialized then pop and push operations will have
an unpredictable behaviour (for teaching purposes the ULM initializes registers
with random values before it executes a program). So we have to be sure that an
instruction like

---- CODE (type=s) -------------------------------------------------------------
	ldzwq	0,	%SP	    # initialize empty stack
--------------------------------------------------------------------------------

gets executed before any push or pop operation is done.


Push operation
--------------
The push operation is implemented by decrementing the stack pointer and then
copying data to the stack pointer. That means the stack grows downwards after
each push:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawHack{0}{$0$}
\DrawMemArrayOpen{-40}{-1}

\DrawQuadVariable[red!40]{-8}{Used}
\DrawMemVariable[cyan!40]{-16}{-8}{Pushed data}
\DrawMemVariable[gray!10]{-40}{-16}{Unused}

\DrawPointer{-16}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

Pushing a quad word from a register `%VAL` onto the stack can be implemented by

---- CODE (type=s) -------------------------------------------------------------
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)
--------------------------------------------------------------------------------

Here `%VAL` can be an arbitrary register. Note that the ULM provides no
instruction for copying a literal directly into memory. You first have to copy
it into a register. For example

---- CODE (type=s) -------------------------------------------------------------
	# load 42
	ldzwq	42,	%VAL
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)
--------------------------------------------------------------------------------

would push the value 42 (zero extended to 64 bits) onto the stack.


Pop operation
-------------
The pop operation copies data from the end of the stack pointer into a register
and then increments the stack pointer: 

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpen{-40}{-1}
\DrawHack{0}{$0$}

\DrawQuadVariable[red!40]{-8}{Used}
\DrawMemVariable[cyan!40]{-16}{-8}{Now unused}
\DrawMemVariable[gray!10]{-40}{-16}{Unused}

\DrawPointer{-8}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

Popping a quad word into a register `%VAL` can be implemented by

---- CODE (type=s) -------------------------------------------------------------
        # push %VAL
        movq    (%SP),	%VAL
        addq    8,      %SP,    %SP
--------------------------------------------------------------------------------

It would be _insane_ to "clean up" memory after a pop operation, e.g by zeroing
out the freed memory region. Such madness would cost an additional instruction!
So stick to the rules: memory on the left-hand-side of `%SP` is free and memory
on the left-hand-side of `%SP` can contain trash.


Why does our stack grow downwards?
==================================
The ULM provides a vast amount of $2^{64}$ bytes of memory. And so far we made
only little use of it. We only use the memory region with small addresses
(actually beginning with  address $0$) to load the segments of the program that
we run:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}
\renewcommand\MemCellWidth { 0.3 }


\DrawMemRange{0}{15}
\DrawMemVariable[gray!50]{0}{15}{ Text segment}
\DrawHack{0}{$0$}

\DrawMemRange{15}{25}
\DrawMemVariable[orange]{15}{25}{ Data segment }

\DrawMemRange{25}{35}
\DrawMemVariable[blue!50]{25}{35}{ BSS segment }

\DrawMemArrayOpenRight{35}{50}

\DrawMemAddress{0}{0x00}

\end{tikzpicture}
--------------------------------------------------------------------------------

When using the stack it should be nearly impossible to overwrite the loaded
program. We can achieve that by

- Either starting the stack at the end of the BSS segment and letting it grow
  upwards (i.e. growing to higher addresses) after a push
- or starting the stack at "end of the memory" and letting it grow downwards.

The choice for the later one is simplicity.  Register `%SP` is used as a 64-bit
address, i.e. an unsigned 64-bit integer.  Therefore address $2^{64}$ actually
means $u\left(2^{64}\right) \bmod 2^{64} = 0$ and therefore zero initializing
`%SP` means that it always points to "the end of the memory" or
one-past-the-last memory cell.

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 1}

\DrawMemArrayOpenLeft{-10}{-1}
\DrawHack{0}{$0$}
\DrawMemAddress{0}{$u(\; 2^{64} \;) \bmod 2^{64} = 0$}

\end{tikzpicture}
--------------------------------------------------------------------------------

Think about the alternative: Using the address after the BSS program would mean
each program would have a different start address for the stack. Certainly that
could be realized equally efficient (the assembler has that information) but it
would require some extra thought.


Example for using the stack to store some characters
====================================================
Before using the stack for function calls you should look at a simple example
to observe how using the stack works and what happens after a push and pop
operation.  For this purpose, the following program is supposed to be as simple
as possible. It first pushes the characters 'A', 'B' and 'C' onto the stack.
Then it pops off and prints these values. While being simple in structure the
program has more then just a few lines of code (hey, we are programming in
assembly):

:import: session10/stack/stack.s [fold]

Running the program therefore produces the following output:

---- SHELL (path=session10/stack) ----------------------------------------------
ulmas -o stack stack.s
ulm stack
--------------------------------------------------------------------------------

But of course the intension of providing this example is to understand the
program line by line. In the following I will visualize how the stack gets
affected and you should confirm that by running the program with the ULM
debugger _ulm-qui_. It is in particular to "see the meaning" of what the code
does.


Initializing the stack
~~~~~~~~~~~~~~~~~~~~~~
The first instruction initializes the stack

---- CODE (type=s) -------------------------------------------------------------
        .equ    SP,     2	# actually required for the stack
        .equ    VAL,    5	# just for convenience

	.text
	ldzwq	0,	%SP
--------------------------------------------------------------------------------

Note that actually both `.equ` directives are just for convenience (and a more
expressive code).  In the `ulm-qui` you only see that register 2 is now zero
initialized, but the meaning of that is the following:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpenLeft{-40}{-1}
\DrawHack{0}{$0$}
\DrawMemAddressAbove{0}{$2^{64}$}
\DrawMemAddressAbove{-8}{$2^{64}-8$}
\DrawMemAddressAbove{-16}{$2^{64}-16$}
\DrawMemAddressAbove{-24}{$2^{64}-24$}
\DrawMemAddressAbove{-32}{$2^{64}-32$}

\DrawPointer{0}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------


Push character 'A'
~~~~~~~~~~~~~~~~~~
As mentioned above we first have to load the character constant into a register.
Except for register `%SP` (and `%0` or course) any register can be used. Such a
register is hidden behind the symbol `VAL` (in the example code below defined as
5):

---- CODE (type=s) -------------------------------------------------------------
	# load value in %VAL
        ldzwq   'A',    %VAL
--------------------------------------------------------------------------------

Now the two instructions for the push operation

---- CODE (type=s) -------------------------------------------------------------
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)
--------------------------------------------------------------------------------

will modify the stack pointer and the quad word at the end of the pointer:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpenLeft{-40}{-1}
\DrawHack{0}{$0$}
\DrawMemAddressAbove{0}{$2^{64}$}
\DrawMemAddressAbove{-8}{$2^{64}-8$}
\DrawMemAddressAbove{-16}{$2^{64}-16$}
\DrawMemAddressAbove{-24}{$2^{64}-24$}
\DrawMemAddressAbove{-32}{$2^{64}-32$}

\DrawQuadVariable{-8}{'A'}

\DrawPointer{-8}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

Push character 'B'
~~~~~~~~~~~~~~~~~~
We follows strictly the same pattern

---- CODE (type=s) -------------------------------------------------------------
	# load value in %VAL
        ldzwq   'B',    %VAL
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)
--------------------------------------------------------------------------------

which results in

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpenLeft{-40}{-1}
\DrawHack{0}{$0$}
\DrawMemAddressAbove{0}{$2^{64}$}
\DrawMemAddressAbove{-8}{$2^{64}-8$}
\DrawMemAddressAbove{-16}{$2^{64}-16$}
\DrawMemAddressAbove{-24}{$2^{64}-24$}
\DrawMemAddressAbove{-32}{$2^{64}-32$}

\DrawQuadVariable{-8}{'A'}
\DrawQuadVariable{-16}{'B'}

\DrawPointer{-16}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

Push character 'C'
~~~~~~~~~~~~~~~~~~
And analogously

---- CODE (type=s) -------------------------------------------------------------
	# load value in %VAL
        ldzwq   'C',    %VAL
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)
--------------------------------------------------------------------------------

results in

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpenLeft{-40}{-1}
\DrawHack{0}{$0$}
\DrawMemAddressAbove{0}{$2^{64}$}
\DrawMemAddressAbove{-8}{$2^{64}-8$}
\DrawMemAddressAbove{-16}{$2^{64}-16$}
\DrawMemAddressAbove{-24}{$2^{64}-24$}
\DrawMemAddressAbove{-32}{$2^{64}-32$}

\DrawQuadVariable{-8}{'A'}
\DrawQuadVariable{-16}{'B'}
\DrawQuadVariable{-24}{'C'}

\DrawPointer{-24}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

Pop a character and print it
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We now apply the pattern for the pop operation and afterwards print the popped
value, i.e.

---- CODE (type=s) -------------------------------------------------------------
        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL
--------------------------------------------------------------------------------

Looking at the memory you will see no difference. Only the register for the
stack pointer changed. And this has the following meaning:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpenLeft{-40}{-1}
\DrawHack{0}{$0$}
\DrawMemAddressAbove{0}{$2^{64}$}
\DrawMemAddressAbove{-8}{$2^{64}-8$}
\DrawMemAddressAbove{-16}{$2^{64}-16$}
\DrawMemAddressAbove{-24}{$2^{64}-24$}
\DrawMemAddressAbove{-32}{$2^{64}-32$}

\DrawQuadVariable{-8}{'A'}
\DrawQuadVariable{-16}{'B'}
\DrawQuadVariable{-24}{'C'}

\DrawPointer{-16}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

Pop a character and print it
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Again the same pattern is applied to remove and print the next character from
the stack:

---- CODE (type=s) -------------------------------------------------------------
        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL
--------------------------------------------------------------------------------

Again, the memory content is not affected but the stack pointer has changed

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpenLeft{-40}{-1}
\DrawHack{0}{$0$}
\DrawMemAddressAbove{0}{$2^{64}$}
\DrawMemAddressAbove{-8}{$2^{64}-8$}
\DrawMemAddressAbove{-16}{$2^{64}-16$}
\DrawMemAddressAbove{-24}{$2^{64}-24$}
\DrawMemAddressAbove{-32}{$2^{64}-32$}

\DrawQuadVariable{-8}{'A'}
\DrawQuadVariable{-16}{'B'}
\DrawQuadVariable{-24}{'C'}

\DrawPointer{-8}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

Pop a character and print it
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I guess you see by now the effect of

---- CODE (type=s) -------------------------------------------------------------
        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL
--------------------------------------------------------------------------------

and how to interpret things you observe in the debugger:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth { 0.6}

\DrawMemArrayOpenLeft{-40}{-1}
\DrawHack{0}{$0$}
\DrawMemAddressAbove{0}{$2^{64}$}
\DrawMemAddressAbove{-8}{$2^{64}-8$}
\DrawMemAddressAbove{-16}{$2^{64}-16$}
\DrawMemAddressAbove{-24}{$2^{64}-24$}
\DrawMemAddressAbove{-32}{$2^{64}-32$}

\DrawQuadVariable{-8}{'A'}
\DrawQuadVariable{-16}{'B'}
\DrawQuadVariable{-24}{'C'}

\DrawPointer{0}{\%SP}

\end{tikzpicture}
--------------------------------------------------------------------------------

Quiz 07 (About the stack)
=========================
This quiz consists of two exercise (see below) and you have to hand in 3 files.
On `theon` do that with the command

`submit hpc quiz07 understand_the_bug.s understand_the_bug.txt push_64.s`

Your answers will not be evaluated automatically but you should get a
confirmation email after your submit.

Exercise: `understand_the_bug.s` and `understand_the_bug.txt`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is basically the program form above:

:import: session10/stack/understand_the_bug.s [fold]

With a small modification The program has a comment `TODO: Your bug here`. Remove that comment with and
replace it with

---- CODE ----------------------------------------------------------------------
        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL
--------------------------------------------------------------------------------

Don't change anything else.  Now translate it and run the program. You should
observe the following:

---- SHELL (path=session10/stack, hide) ----------------------------------------
ulmas -o understand_the_bug understand_the_bug2.s
--------------------------------------------------------------------------------

---- SHELL (path=session10/stack) ----------------------------------------------
ulm understand_the_bug
--------------------------------------------------------------------------------

I think so far this exercise was simple ;-) So now comes the challenge:

- Explain why this result could be predicted.
- Modify at most to characters in this buggy program so that the program prints
  `CBA!` followed by a newline.
  
  To be clear: don't fix the bug, play with the bug. We know that the problem is
  that more elements are removed than previously added to the stack. You will
  have the opportunity to fix many bugs in the future.

Exercise: `push_64.s`
~~~~~~~~~~~~~~~~~~~~~
Write a program that does the following (in that order):

- push the literal `0xFFFFFFFFFFFFFFFF` on the stack,
- push the literal `0x1234567890123456` on the stack.

Hint: The purpose of the that is to "motivate" you to __read about__ how in
general a 64-bit address can be loaded into a register. Because in general

---- CODE (type=s) -------------------------------------------------------------
    ldzwq   function_label, %CALL
--------------------------------------------------------------------------------

is only working if the label refers to an address in the  range from 0 to
$2^{16}-1$. 

:links: stack -> https://en.wikipedia.org/wiki/Stack_(abstract_data_type)
	read about -> doc:session09/page04
	Opcode 0x18 -> http://www.mathematik.uni-ulm.de/numerik/hpc/ss20/hpc0/ulm.pdf#page=31 
	Opcode 0x28 -> http://www.mathematik.uni-ulm.de/numerik/hpc/ss20/hpc0/ulm.pdf#page=31