Calling conventions

What is shown here is a simple and efficient way for realizing a function call convention. However, there is a serious catch to it. It only works for so called leaf functions, i.e. functions that do not call another functions. So in particular something like recursive function calls will not work with that.

At the moment we develop concepts for programming a computer in general and do not really focus on efficiency. In Session 8 you saw how difficult it can get to write fairly complex programs without having a possibility to break components into reusable functions (for reading an integer, computing something, printing an integer). What we did there was thinking about a program that consisted of such parts (i.e. functions) but instead of calling a function we just inserted its code where it was needed. In the real world this is an optimization called function inlining. This is an optimization because function calls always have some overhead. You will see here that this overhead consists of additional code for passing arguments, calling the function, returning form the function etc.

With respect to efficiency the function call convention would be the next best thing compared to inlining the call. But again, efficiency is not our main concern here, it is still about programming the ULM in general. So let's say we look at this convention because compared to the general case (which includes non-leaf functions) it is easier to deal with the special case and for this case we seek the simplest, possible solution.

Using the instruction jmp %CALL, %RET for function calls

Key for supporting functions calls on the ULM is the jmp instruction with Opcode 0x40. It has two register operands to which we refer as %CALL and %RET respectively. When executed it does two things:

  • It jumps to the address in %CALL and

  • stores the return address %RET.

So what does return address mean? It is the address of the instruction that follows the jump instruction in memory. For example, when the jump instruction is at address 0x20 then the return address is 0x24.

The sequential execution and the implementation pattern for a program that calls a leaf-function func twice can be visualised as follows:

And analogously cases where different functions are called the flow chart can be represented like this:

You want a more technical explanation?

Recall that the ULM implements this simple von Neumann cycle for executing a program. So when an instruction gets executed it was loaded from address %IP (instruction pointer) into the %IR (instruction register). So what the instruction does when it gets executed is

\[\left( u(\%\text{ip}) + 4 \right) \bmod 2^{64} \to u(\%\text{RET})\quad\text{and}\quad u(\%\text{CALL}) \to u(\%\text{ip})\]

Returning from a function is then possible with a jmp %RET, %0 instruction. Of course this requires that %RET by then still (or again) contains the original return address. So when you program you should know that you can not simply use register %RET without care. So you see that we need some rules, a protocol or call it convention for function calls.

Descriptions of such conventions usually the terms caller and callee. As long as we only deal with leaf functions these terms have a simple meaning. The program text can be divided into a main program and functions. In the main program we only have code that eventually calls a function, i.e. caller code. In the functions we only have code that gets executed when a function was called, i.e. callee code. Now think of two persons (caller and callee) writing a program together. The caller is writing the main program, the callee all the functions. The function call conventions are a contract between the caller and callee and allows to collaborate almost independently from each other. Actually the contract specifies what information one has to provide to another.

Caller

The caller has to know the address of the function, how to pass arguments and eventually how to get a result back from the function. The caller also has to know what registers the function might modify.

In the simplest form the function does not expect any arguments and does not return a result. In this case the caller just loads the address of the function into %CALL and does the (function) jump:

Also shown in this picture is the entry point of the program. The entry point is the address of the instruction that gets executed first when execution of the program starts. Of course a correct program should also have at least one exit point, i.e. some halt instruction. As a programmer you have to make sure that at least on of these exit points gets reached.

Callee

Accordingly the callee needs to know how to receive arguments, how to return and eventually how to give back a result. And of course, what registers can be used and modified for computations.

Again we first consider that the function does not receive any arguments and does not return any result. Then we basically have this (we talk about what registers the function implementation is allowed to modify below):

Example

So here another “hello, world” program:

theon$ ulmas -o func01 func01.s
theon$ ulm func01
hello, world!
hello, world!
theon$ 

This time it uses functions. For the sake of simplicity we first stick to the case where the function does not receive arguments, does not return a result and not even uses any registers. But it has two functions and does four function calls:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
        .equ    CALL,           1
        .equ    RET,            CALL + 1

        .text

/*
        Entry point of the program
*/
        # hello_function();
        ldzwq   hello_function, %CALL
        jmp     %CALL,          %RET

        # world_function();
        ldzwq   world_function, %CALL
        jmp     %CALL,          %RET

        # hello_function();
        ldzwq   hello_function, %CALL
        jmp     %CALL,          %RET

        # world_function();
        ldzwq   world_function, %CALL
        jmp     %CALL,          %RET
/*
        Exit point
*/
        halt    0

//------------------------------------------------------------------------------

/*
        void
        hello_function();
*/

hello_function:
        putc    'h'
        putc    'e'
        putc    'l'
        putc    'l'
        putc    'o'
        putc    ','
        putc    ' '
        jmp     %RET,   %0

/*
        void
        hello_function();
*/
world_function:
        putc    'w'
        putc    'o'
        putc    'r'
        putc    'l'
        putc    'd'
        putc    '!'
        putc    '\n'
        jmp     %RET,   %0

Callee, the perfect roomer

The caller and callee need a protocol that specifies what registers (and as we talk about later what memory region) can be used without conflicts. We keep it simple here: The callee is only allowed to modify registers that we denote as %CALLEE0, %CALLEE1, etc. This will be defined through some .equ directives. For example in the next example we define

1
2
    .equ    CALLEE0,        1
    .equ    CALLEE1,        CALLEE0 + 1

Think of this like a contract between you and a roomer that lives in your apartment. This contract specifies what the roomer is allowed to use in your apartment.

In other examples we give the callee access to more registers but it always will be limited. Of course the ULM would have enough registers to share the registers more equally. But first of all, we want to keep it close enough to reality where most computers have much fewer registers and they can live with that (And so can we: The ULM has only 256 registers because that makes it simple to read the machine code).

Next, a callee can actually use any register as long as the callee can restore them after usage. That means before returning to the caller all registers that do not belong to the callee must have their original value. So no harm done. Like in the example with the roomer in your apartment. You certainly don't mind if someone uses your stuff if it is guarantied that you will never notice ;-)

Furthermore, sooner or later we want to think about functions that are non-leaf functions, i.e functions that call other functions. In this case the callee becomes a caller (and then again a callee before it returns). So just equally sharing registers between caller and callee would be a dead end.

But let's talk about here and now. We still have to specify how arguments are passed and results returned.

Passing arguments

A function receives its arguments (if any) in the callee registers: The first argument in %CALLEE0, the second in %CALLEE1, etc.

The caller has to know how many arguments are expected by the callee. There is no way that the assembler can check if you are passing to few or too many arguments. So the programmer of the function has to document this (or the caller would have to look at the code and figure that out).

Receiving results

If a function returns a value this value will be store in %CALLEE0 after the return. So the callee has to put it there before returning.

Example for a puts function

We continue with a multilingual version of the “hello, world” program

theon$ ulmas -o func02 func02.s
theon$ ulm func02
hello, world!
Hallo Welt!🍺
你好,世界
theon$ 

In this example the function puts expects one argument (in %CALLEE0) which is a pointer to a string that will be printed by the function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
        # Example for realizing leaf functions (recursive calls will fail!)

        # registers that a callee can use
        .equ    CALLEE0,        1
        .equ    CALLEE1,        CALLEE0 + 1

        # all other registers belong to the caller
        .equ    CALL,           CALLEE1 + 1     
        .equ    RET,            CALL + 1

        .data
str0    .string "hello, world!\n"
str1    .string "Hallo Welt!🍺\n"
str2    .string "你好,世界\n"

        .text
/*
        Entry point of the program
*/

        # puts(str0);
        ldzwq   puts,           %CALL
        ldzwq   str0,           %CALLEE0
        jmp     %CALL,          %RET

        # puts(str1);
        ldzwq   puts,           %CALL
        ldzwq   str1,           %CALLEE0
        jmp     %CALL,          %RET

        # puts(str2);
        ldzwq   puts,           %CALL
        ldzwq   str2,           %CALLEE0
        jmp     %CALL,          %RET

/*
        Exit point
*/
        halt    0

//------------------------------------------------------------------------------

/*
        void
        puts(const char *);
*/

puts:
.L0     movzbq  (%CALLEE0),     %CALLEE1
        subq    0,              %CALLEE1,       %0
        jz      .L1
        putc    %CALLEE1
        addq    1,              %CALLEE0,       %CALLEE0
        jmp     .L0
.L1
        jmp     %RET,           %0

Example with functions puts, readuint and printuint

In general our functions need more than just two registers. Until we don't know how to use a stack to save and restore registers that do not belong to a callee we have to provide more registers to the callee.

The following program reads in an integer and then prints it. The callee can now use up to 4 registers. If you use it interactively it looks like that:

1
2
3
4
theon$ ulm func03
Type in some unsigned integer (that fits in 64 bits): 1234567890123
You typed in: 1234567890123
bye!

When I create this web pages automatically (which has the advantage that you can be sure that what you see in the shell box below is no fake) I can not type in numbers. So I generate the input by piping the output of an echo like here:

theon$ ulmas -o func03 func03.s
theon$ echo 1234567890123 | ulm func03
Type in some unsigned integer (that fits in 64 bits): You typed in: 1234567890123
bye!
theon$ 
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
        # Example for realizing leaf functions (recursive calls will fail!)


        # registers that a callee can use
        .equ    CALLEE0,        1
        .equ    CALLEE1,        CALLEE0 + 1
        .equ    CALLEE2,        CALLEE1 + 1
        .equ    CALLEE3,        CALLEE2 + 1

        # all other registers belong to the caller
        .equ    CALL,           CALLEE3 + 1     
        .equ    RET,            CALL + 1
        .equ    N,              RET + 1

        .data
str0    .string "Type in some unsigned integer (that fits in 64 bits): "
str1    .string "You typed in: "
str2    .string "\nbye!\n"

        .text
/*
        Entry point of the program
*/

        # puts(str0);
        ldzwq   puts,           %CALL
        ldzwq   str0,           %CALLEE0
        jmp     %CALL,          %RET

        # readuint();
        ldzwq   readuint,       %CALL
        jmp     %CALL,          %RET
        addq    %CALLEE0,       %0,             %N

        # puts(str1);
        ldzwq   puts,           %CALL
        ldzwq   str1,           %CALLEE0
        jmp     %CALL,          %RET

        # printuint();
        ldzwq   printuint,      %CALL
        addq    %N,             %0,             %CALLEE0
        jmp     %CALL,          %RET

        # puts(str2);
        ldzwq   puts,           %CALL
        ldzwq   str2,           %CALLEE0
        jmp     %CALL,          %RET
/*
        Exit point
*/
        halt    0


/*
        void
        puts(const char *);
*/

puts:
.L0     movzbq  (%CALLEE0),     %CALLEE1
        subq    0,              %CALLEE1,       %0
        jz      .L1
        putc    %CALLEE1
        addq    1,              %CALLEE0,       %CALLEE0
        jmp     .L0
.L1     jmp     %RET,           %0

/*
        uint64_t
        readuint();
*/
readuint:
        ldzwq   0,              %CALLEE0
.L2     getc    %CALLEE1
        subq    '\n',           %CALLEE1,       %0
        jz      .L3
        subq    '0',            %CALLEE1,       %CALLEE1
        imulq   10,             %CALLEE0,       %CALLEE0
        addq    %CALLEE1,       %CALLEE0,       %CALLEE0
        jmp     .L2
.L3     jmp     %RET,           %0


/*
        void
        printuint(uint64_t);
*/
        .bss
.L4     .space  21

        .text
printuint:
        ldzwq   .L4,            %CALLEE3

        ldzwq   0,              %CALLEE1

.L5     divq    10,             %CALLEE0,       %CALLEE0
        addq    '0',            %CALLEE2,       %CALLEE2
        addq    1,              %CALLEE3,       %CALLEE3
        movb    %CALLEE2,       (%CALLEE3)
        subq    0,              %CALLEE0,       %0
        jnz     .L5

.L6     movzbq  (%CALLEE3),     %CALLEE2
        subq    0,              %CALLEE2,       %0
        jz      .L7
        putc    %CALLEE2
        subq    1,              %CALLEE3,       %CALLEE3
        jmp     .L6
.L7     jmp     %RET,           %0