Stack data structure

For function calls we will use a stack for storing return addresses, arguments and local variables. In general a stack is an abstract data type that provides two operations

  • push for adding an element, and

  • pop for removing the most recently added element.

What we need is a stack data structure, i.e. an implementation of this abstract concept that can be used later for function calls. On the Intel architecture instructions like popq and pushq are provided for this operations, and registers like %rsp and %rbp are dedicated for the stack data structure. Per se the ULM architecture does not provide such dedicated instructions for stack operations and neither dedicated registers.

So it is up us, the programmers, to implement these operations, agree upon which register are dedicated to the stack implementation, and we even have to decide (or freely choose) which memory region gets used for the stack.

Error handling

Error handling for the stack is a luxury that we do not provide. Our stack implementation has to be efficient and therefore it has to be as simple as possible. If something goes wrong it is either the fault of the programmer or we ran into a scenario that we considered for efficiency concern as “sufficiently unlikely” (and can not be handled if they happen after all).

Programmers fault: Popping data from an empty stack

Things will go south if more data gets removed from the stack than was added. This easily can be avoided by making sure that push and pop operations only come in pairs. That's a simple rule, hence not our fault if a programmer does not follow it.

Programmers fault: Bugs in using the push and pop operation

As mentioned above, unlike the Intel instruction set the ULM does not provide builtin push and pop instruction. But there will be a recipe that in a “don't think just copy-and-paste these lines for push and these for pop” style. That's a simple rule, hence not our fault if a programmer does not follow it.

The “sufficiently unlikely” case

When we push something onto the stack we have to find memory space that can be overwritten. We should not overwrite memory that is used by the loaded program (i.e. the text, data and BSS segment are taboo). We will use a simple that “most likely always” finds some free memory for a push and releases it after a pop. In the unlikely event of running out of memory we will have a stack overflow. And in that case our program will not crash, and that is a problem, it will continue run on with undefined behavior.

Fetch and store instructions for the ULM

The ULM instruction set provides fetch instructions for copying data from the memory into a register, and store instruction to copy data from a register to memory. What we use here is movq (%Y), %Z with Opcode 0x18 for fetching, and movq %X, (%Z) with Opcode 0x28 for storing. As you see in the description of these instructions there is a so called alignment restriction, the address has to be a multiple of eight. In this stack example this will always be the case. So you can ignore it for now, I will dedicate an extra page in this session to this topic.

Conventions for the stack

By convention register 2 is dedicated to the stack data structure and for convenience we assume an directive like

1
          .equ    SP,     2

is used, but it is not required. The reason for choosing register 2 is arbitrary, as we have to choose something, and therefore does not require any justification. But I can give some explanation: choosing register 0 would be unwise and register 1 will have some special meaning in the next stack convention which is also used by the ULM C compiler.

Initializing the stack

The stack gets initialized by setting %SP to zero. Because the memory has \(2^{64}\) bytes and registers \(64\) bits, zero is the address that points one-past-the-last memory cell:

Meaning of the stack pointer

If %SP is zero it means the stack is empty. That's because by convention only memory cells with an address greater or equal to %SP contain data that is stored on the stack. That means “memory on the right-hand-side of %SP” is used by the stack:

If the stack pointer is not initialized then pop and push operations will have an unpredictable behaviour (for teaching purposes the ULM initializes registers with random values before it executes a program). So we have to be sure that an instruction like

1
        ldzwq   0,      %SP         # initialize empty stack

gets executed before any push or pop operation is done.

Push operation

The push operation is implemented by decrementing the stack pointer and then copying data to the stack pointer. That means the stack grows downwards after each push:

Pushing a quad word from a register %VAL onto the stack can be implemented by

1
2
3
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)

Here %VAL can be an arbitrary register. Note that the ULM provides no instruction for copying a literal directly into memory. You first have to copy it into a register. For example

1
2
3
4
5
        # load 42
        ldzwq   42,     %VAL
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)

would push the value 42 (zero extended to 64 bits) onto the stack.

Pop operation

The pop operation copies data from the end of the stack pointer into a register and then increments the stack pointer:

Popping a quad word into a register %VAL can be implemented by

1
2
3
        # push %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP

It would be insane to “clean up” memory after a pop operation, e.g by zeroing out the freed memory region. Such madness would cost an additional instruction! So stick to the rules: memory on the left-hand-side of %SP is free and memory on the left-hand-side of %SP can contain trash.

Why does our stack grow downwards?

The ULM provides a vast amount of \(2^{64}\) bytes of memory. And so far we made only little use of it. We only use the memory region with small addresses (actually beginning with address ) to load the segments of the program that we run:

When using the stack it should be nearly impossible to overwrite the loaded program. We can achieve that by

  • Either starting the stack at the end of the BSS segment and letting it grow upwards (i.e. growing to higher addresses) after a push

  • or starting the stack at “end of the memory” and letting it grow downwards.

The choice for the later one is simplicity. Register %SP is used as a 64-bit address, i.e. an unsigned 64-bit integer. Therefore address \(2^{64}\) actually means \(u\left(2^{64}\right) \bmod 2^{64} = 0\) and therefore zero initializing %SP means that it always points to “the end of the memory” or one-past-the-last memory cell.

Think about the alternative: Using the address after the BSS program would mean each program would have a different start address for the stack. Certainly that could be realized equally efficient (the assembler has that information) but it would require some extra thought.

Example for using the stack to store some characters

Before using the stack for function calls you should look at a simple example to observe how using the stack works and what happens after a push and pop operation. For this purpose, the following program is supposed to be as simple as possible. It first pushes the characters 'A', 'B' and 'C' onto the stack. Then it pops off and prints these values. While being simple in structure the program has more then just a few lines of code (hey, we are programming in assembly):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
        .equ    SP,     2
        .equ    VAL,    68      # just for convenience

        .text

        /*
            initialize stack
        */
        ldzwq   0,      %SP


        /*
            push 'A", push 'B', push 'C'
        */

        # load value in %VAL
        ldzwq   'A',    %VAL
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)

        # load value in %VAL
        ldzwq   'B',    %VAL
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)

        # load value in %VAL
        ldzwq   'C',    %VAL
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)


        /*
            pop and print value (3 times)
        */
        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL

        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL

        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL

        /*
                TODO: Your bug here
        */

        putc    '\n'
        halt    0

Running the program therefore produces the following output:

theon$ ulmas -o stack stack.s
theon$ ulm stack
CBA
theon$ 

But of course the intension of providing this example is to understand the program line by line. In the following I will visualize how the stack gets affected and you should confirm that by running the program with the ULM debugger ulm-qui. It is in particular to “see the meaning” of what the code does.

Initializing the stack

The first instruction initializes the stack

1
2
3
4
5
        .equ    SP,     2       # actually required for the stack
        .equ    VAL,    5       # just for convenience

        .text
        ldzwq   0,      %SP

Note that actually both .equ directives are just for convenience (and a more expressive code). In the ulm-qui you only see that register 2 is now zero initialized, but the meaning of that is the following:

Push character 'A'

As mentioned above we first have to load the character constant into a register. Except for register %SP (and %0 or course) any register can be used. Such a register is hidden behind the symbol VAL (in the example code below defined as 5):

1
2
        # load value in %VAL
        ldzwq   'A',    %VAL

Now the two instructions for the push operation

1
2
3
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)

will modify the stack pointer and the quad word at the end of the pointer:

Push character 'B'

We follows strictly the same pattern

1
2
3
4
5
        # load value in %VAL
        ldzwq   'B',    %VAL
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)

which results in

Push character 'C'

And analogously

1
2
3
4
5
        # load value in %VAL
        ldzwq   'C',    %VAL
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)

results in

Pop a character and print it

We now apply the pattern for the pop operation and afterwards print the popped value, i.e.

1
2
3
4
5
        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL

Looking at the memory you will see no difference. Only the register for the stack pointer changed. And this has the following meaning:

Pop a character and print it

Again the same pattern is applied to remove and print the next character from the stack:

1
2
3
4
5
        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL

Again, the memory content is not affected but the stack pointer has changed

Pop a character and print it

I guess you see by now the effect of

1
2
3
4
5
        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL

and how to interpret things you observe in the debugger:

Quiz 07 (About the stack)

This quiz consists of two exercise (see below) and you have to hand in 3 files. On theon do that with the command

submit hpc quiz07 understand_the_bug.s understand_the_bug.txt push_64.s

Your answers will not be evaluated automatically but you should get a confirmation email after your submit.

Exercise: understand_the_bug.s and understand_the_bug.txt

This is basically the program form above:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
        .equ    SP,     2
        .equ    VAL,    68      # just for convenience

        .text

        /*
            initialize stack
        */
        ldzwq   0,      %SP


        /*
            push 'A", push 'B', push 'C'
        */

        # load value in %VAL
        ldzwq   'A',    %VAL
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)

        # load value in %VAL
        ldzwq   'B',    %VAL
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)

        # load value in %VAL
        ldzwq   'C',    %VAL
        # push %VAL
        subq    8,      %SP,    %SP
        movq    %VAL,   (%SP)


        /*
            pop and print value (3 times)
        */
        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL

        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL

        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL

        /*
         *      TODO: Your bug here.
         */
        putc    '\n'
        halt    0

With a small modification The program has a comment TODO: Your bug here. Remove that comment with and replace it with

1
2
3
4
5
        # pop %VAL
        movq    (%SP),  %VAL
        addq    8,      %SP,    %SP
        # use popped value
        putc    %VAL

Don't change anything else. Now translate it and run the program. You should observe the following:

theon$ ulm understand_the_bug
CBAD
theon$ 

I think so far this exercise was simple ;-) So now comes the challenge:

  • Explain why this result could be predicted.

  • Modify at most to characters in this buggy program so that the program prints CBA! followed by a newline.

    To be clear: don't fix the bug, play with the bug. We know that the problem is that more elements are removed than previously added to the stack. You will have the opportunity to fix many bugs in the future.

Exercise: push_64.s

Write a program that does the following (in that order):

  • push the literal 0xFFFFFFFFFFFFFFFF on the stack,

  • push the literal 0x1234567890123456 on the stack.

Hint: The purpose of the that is to “motivate” you to read about how in general a 64-bit address can be loaded into a register. Because in general

1
    ldzwq   function_label, %CALL

is only working if the label refers to an address in the range from 0 to \(2^{16}-1\).