Using the ULM C Compiler

Before makefiles for building libraries and executables you first should learn how to use the ULM C compiler manually. You will also see here some examples to exemplify that C code is just describing assembly code that the compiler generates for you. That's nice because it saves typing and avoids typical errors that often occur when you manually write assembly code. And it's nice because you have the potential that assembly code for every sane architecture can be generated.

But you should not expect that C is doing the thinking for you. The language is not designed for incompetent programmers that don't care about details and just wanna mindlessly type together some code that hopefully works. Again, writing C code is writing assembly code.

Generating assembly code

The ULM C compiler merely translates C code into assembly, i.e. it acts like gcc with option -S. For example, the C code

1
2
3
4
5
6
7
8
extern void
puts(char *str);

int
main()
{
    puts("hello, world!\n");
}

gets translated with

theon$ ulmcc -o xhello_in_c.s xhello_in_c.c
theon$ 

into

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
        .text
        .globl main
# function header of int32_t main():
main:
        movq %3,0(%2)
        movq %1,8(%2)
        addq $0,%2,%1
# function body of main:
        # puts("hello, world!\n");
        # function call: puts("hello, world!\n")
        subq $24,%2,%2
        ldzwq $.L1,%4
        movq %4,16(%2)
        ldzwq $puts,%4
        jmp %4,%3
        addq $24,%2,%2
        ldzwq $0,%4
        movl %4,16(%1)
# return from function main:
.L0:
        addq $0,%1,%2
        movq 8(%2),%1
        movq 0(%2),%3
        jmp %3,%0
        .data
.L1:
        .string "hello, world!\n"

Actually we could have skipped the “-o xhello_in_c.s” here as by default the output filename is the base name of the input file with extension “.s”.

Creating a minimalistic executable

For creating an executable you need to use the assembler and the linker. In Session 10.8 you already saw the most minimalistic C program

1
2
3
4
int
main()
{
}

Recall that the main function returns zero if you don't leave the function with an explict return statement. The ULM C compiler translates that source code into

theon$ ulmcc -o main_no_return.s main_no_return.c
theon$ cat main_no_return.s
        .text
        .globl main
# function header of int32_t main():
main:
        movq %3,0(%2)
        movq %1,8(%2)
        addq $0,%2,%1
# function body of main:
        ldzwq $0,%4
        movl %4,16(%1)
# return from function main:
.L0:
        addq $0,%1,%2
        movq 8(%2),%1
        movq 0(%2),%3
        jmp %3,%0
theon$ 

For an executable we have to translate that assembly code into machine code with the ULM assembler and link it with a _start function that initializes the stack, calls main and halt the ULM using the return value as exit code. That is obviously the crt0.s code from Session 11 (also recall quiz13):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
        .equ    FP,             1
        .equ    SP,             2
        .equ    RET,            3

//------------------------------------------------------------------------------
// Function _start()
//------------------------------------------------------------------------------
        .equ    ret,            0
        .equ    fp,             8
        .equ    rval,           16

        .text
        .globl _start
_start:
        // begin of the function body

        ldzwq   0,              %SP

        // call function main()
        subq    24,             %SP,            %SP
        ldzwq   main,           %4
        jmp     %4,             %RET
        movzlq  rval(%SP),      %4
        addq    24,             %SP,            %SP

        halt    %4

Because the ULM C compiler uses exactly the same calling conventions that were specified in Session 10.8 we simply can translate and link crt0.s and main_no_return.s:

theon$ ulmas -o main_no_return.o main_no_return.s
theon$ ulmas -o crt0.o crt0.s
theon$ ulmld -o main_no_return main_no_return.o crt0.o
theon$ 

Running the executable we can check the exit code with echo $?:

theon$ ulm main_no_return; echo $?
0
theon$ 

Another minimalistic example

Just for fun let's return some non-zero value from main:

1
2
3
4
5
int
main()
{
    return 42;
}

Now the complete build process is as follows:

theon$ ulmcc -o main_returns_42.s main_returns_42.c
theon$ ulmas -o main_returns_42.o main_returns_42.s
theon$ ulmas -o crt0.o crt0.s
theon$ ulmld -o main_returns_42 main_returns_42.o crt0.o
theon$ 

And expected we get the exit code 42 after the program terminated:

theon$ ulm main_returns_42; echo $?
42
theon$ 

Say hello with ULM C

The remaining examples on this page will be “hello, world” programs implemented in C. All varaints will use a puts function for printing the string. Initiallly we will use puts from Session 11.2 implemented in assembly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
        .equ    FP,             1
        .equ    SP,             2
        .equ    RET,            3

//------------------------------------------------------------------------------
// Procedure puts(str)
//------------------------------------------------------------------------------
        .equ    ret,            0
        .equ    fp,             8

        // procedure arguments
        .equ    str,            16

        .text
        .globl  puts
puts:
        // function prologue
        movq    %RET,           ret(%SP)
        movq    %FP,            fp(%SP)
        addq    0,              %SP,            %FP
        // reserve space for 0 local variables.
        subq    0,              %SP,            %SP
        // begin of the function body

        /*
            if (*str == 0)
                goto puts.leave;
        */
puts.while
        movq    str(%FP),       %4
        movzbq  (%4),           %4
        subq    0,              %4,             %0
        jz      puts.while.done

        /*
            putchar(*str);
        */
        movq    str(%FP),       %4
        movzbq  (%4),           %4
        putc    %4

        /*
           str = str + 1;
        */
        movq    str(%FP),       %4
        addq    1,              %4,             %4
        movq    %4,             str(%FP)

        /*
           goto puts.while;
        */
        jmp     puts.while
puts.while.done:

        // end of the function body
        // function epilogue
puts.leave:
        addq    0,              %FP,            %SP
        movq    fp(%SP),        %FP
        movq    ret(%SP),       %RET
        jmp     %RET,           %0

The program will be explained in a bit more detail below:

1
2
3
4
5
6
7
8
extern void
puts(char *str);

int
main()
{
    puts("hello, world!\n");
}

If a function or procedure is not defined within a source file the compiler needs to know how to call it. The lines

1
2
extern void
puts(char *str);

do not generate any code. They just tell the compiler (or think of it as a programmer's promise given to the compiler) that a function puts will be available after linkage. More precisely it tells that

  • puts is a procedure, i.e. it does not return a value, and

  • puts expects exactly one argument arg which has a size of 8 bytes (and its value is the address of a byte).

From Session 10 you know that this information is needed for properly preparing the stack before the jump to puts gets done. So let's use the compiler to do that with

theon$ ulmcc -o hello.s hello.c
theon$ 

Look at the generated assembly code to confirm that the compiler generates code from the C source file that you easily could have written yourself (in particular look at how puts gets called):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
        .text
        .globl main
# function header of int32_t main():
main:
        movq %3,0(%2)
        movq %1,8(%2)
        addq $0,%2,%1
# function body of main:
        # puts("hello, world!\n");
        # function call: puts("hello, world!\n")
        subq $24,%2,%2
        ldzwq $.L1,%4
        movq %4,16(%2)
        ldzwq $puts,%4
        jmp %4,%3
        addq $24,%2,%2
        ldzwq $0,%4
        movl %4,16(%1)
# return from function main:
.L0:
        addq $0,%1,%2
        movq 8(%2),%1
        movq 0(%2),%3
        jmp %3,%0
        .data
.L1:
        .string "hello, world!\n"

Having the assembly code generated by the C compiler we now can generate the executable as before, i.e. use the assembler for generating object files and then link it:

theon$ ulmas -o hello.o hello.s
theon$ ulmas -o crt0.o crt0.s
theon$ ulmas -o puts.o puts.s
theon$ ulmld -o hello hello.o puts.o crt0.o
theon$ 

And of course check that it is working:

theon$ ulm hello
hello, world!
theon$ 

With great power comes great responsibility

One fundamental principal of C (and C++) is trust the programmer. If you deserve this trust then the language is a powerful tool. Otherwise it is insane to use this gun!

Here comes an example that shows how the compiler trusts you. In this variant it is promised that puts is a function that returns an integer:

extern int
puts(char *str);

int
main()
{
    puts("hello, world!\n");
}

The compiler does not know what implementation the linker later adds for resolving the symbol puts. So of course you can not expect any error or warning:

theon$ ulmcc -o hello2.s hello2.c
theon$ 

The compiler just generates generates the proper code for calling that function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
        .text
        .globl main
# function header of int32_t main():
main:
        movq %3,0(%2)
        movq %1,8(%2)
        addq $0,%2,%1
# function body of main:
        # puts("hello, world!\n");
        # function call: puts("hello, world!\n")
        subq $24,%2,%2
        ldzwq $.L1,%4
        movq %4,16(%2)
        ldzwq $puts,%4
        jmp %4,%3
        addq $24,%2,%2
        ldzwq $0,%4
        movl %4,16(%1)
# return from function main:
.L0:
        addq $0,%1,%2
        movq 8(%2),%1
        movq 0(%2),%3
        jmp %3,%0
        .data
.L1:
        .string "hello, world!\n"

Note the difference to before. The assembly code now calls a function so if you don't adapt the implementation of puts then things will go wrong:

theon$ diff hello.s hello2.s
11c11
<    subq $24,%2,%2
---
>    subq $32,%2,%2
13c13
<    movq %4,16(%2)
---
>    movq %4,24(%2)
16c16,17
<    addq $24,%2,%2
---
>    movslq 16(%2),%4
>    addq $32,%2,%2
theon$ 

Now let's use this code together with the unmodified procedure puts. There is no way that the assembler or linker can check that this will go wrong, so there is no warning or error:

theon$ ulmas -o hello2.o hello2.s
theon$ ulmas -o crt0.o crt0.s
theon$ ulmas -o puts.o puts.s
theon$ ulmld -o hello2 hello2.o puts.o crt0.o
theon$ 

Because the programmer in this case neither deserved respect nor trust the program is not working as expected:

theon$ ulm hello2
<=.JY<i_6;)"61#k-QGxZhH!yj7ouZJ4U&')@=q-+:^y^       s"u{|uWy)#sTDk}+aV(ek<^)rhnJNYc}Ye:]1V]D!D+6h_q&E$$theon$ 

Some (not so) fun fact

The C standard library actually defines a function puts where the correct prototype is

1
2
extern int
puts(char *str);

So don't get confused, the important thing to learn here is that the declaration (the prototype for the function) and the definition (the implementation of the function) have to match. And that the programmer has to make sure that they match. If you can not guarantee that things will go wrong and you can not expect any help from the compiler, assembler or linker.

To some degree using so called header files will help here. But that's a different story that involves the C preprocessor. Btw, I really hope that I live long enough to see that the C preprocessor one day becomes obsolete in C++.

Using C for writing a puts implementation

It's a good idea to implement functions in C when possible and only in assembly if necessary. That's because the code then does not depend on the specific platform. And if the compiler is able to do some reasonable optimizations the generated assembly code is usual efficient enough that writing the assembly code manually provides no advantage.

The following C code requires that a procedure is available for printing a single character:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
extern void
putchar(char ch);

void
puts(char *str)
{
    while (*str) {
        putchar(*str);
        ++str;
    }
}

This procedure putchar is platform dependent and you have to provide it therefore in handwritten assembly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
        .equ    FP,             1
        .equ    SP,             2
        .equ    RET,            3

//------------------------------------------------------------------------------
// Procedure putchar(ch)
//------------------------------------------------------------------------------
        .equ    ret,            0
        .equ    fp,             8

        // procedure arguments
        .equ    ch,             16

        .text
        .globl  putchar
putchar:
        // function prologue
        movq    %RET,           ret(%SP)
        movq    %FP,            fp(%SP)
        addq    0,              %SP,            %FP
        // reserve space for 0 local variables.
        subq    0,              %SP,            %SP
        // begin of the function body

        movzbq  ch(%FP),        %4
        putc    %4

        // end of the function body
        // function epilogue
putchar.leave:
        addq    0,              %FP,            %SP
        movq    fp(%SP),        %FP
        movq    ret(%SP),       %RET
        jmp     %RET,           %0

Now generate assembly code from the C source files

theon$ ulmcc -o hello.s hello.c
theon$ ulmcc -o puts.s puts.c
theon$ 

and look at the assembly code generated from puts.c:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
        .text
        .globl puts
# function header of void puts(uint8_t *str):
puts:
        movq %3,0(%2)
        movq %1,8(%2)
        addq $0,%2,%1
# function body of puts:
        # while (*str) { ...
        jmp .L1
.L2:
# putchar(*str);
        # function call: putchar(*str)
        subq $24,%2,%2
        movq 16(%1),%4
        movzbq 0(%4),%4
        movb %4,16(%2)
        ldzwq $putchar,%4
        jmp %4,%3
        addq $24,%2,%2
        # str += 1;
        ldzwq $1,%4
        addq $16,%1,%5
        movq 0(%5),%6
        addq %6,%4,%4
        movq %4,0(%5)
        # condition: *str
.L1:
        movq 16(%1),%4
        movzbq 0(%4),%4
        subq %0,%4,%0
        jne .L2
.L3:
# ... } // while (*str)
        
# return from function puts:
.L0:
        addq $0,%1,%2
        movq 8(%2),%1
        movq 0(%2),%3
        jmp %3,%0

For most parts it is essentially to our handwritten assembly code. However, there is one significant difference. For printing a single character we always have the call of putchar instead of directly using a single putc instruction:

  • Calling putchar requires 7 instructions, and

  • the definition of putchar has 10 instructions.

That's a huge overhead and the reason why compiler should provide optimizations like function inlining and a C programmer should know how to make use of them. For now we postpone this topic and just belief such issues can be taken care of later.

Having automatically generated assembly code we generate the executable as usual:

theon$ ulmas -o hello.o hello.s
theon$ ulmas -o crt0.o crt0.s
theon$ ulmas -o puts.o puts.s
theon$ ulmas -o putchar.o putchar.s
theon$ ulmld -o hello hello.o putchar.o puts.o crt0.o
theon$ ulm hello
hello, world!
theon$ 

Let's hide things in a software layer

The ULM has some feature that was kept secret so far, some kind of a firmware. This firmware for instance allows so copy a memory block character by charater to the output device. And this bypasses the CPU so for prining a string we don't have to copy a byte first from memory into a register and from there to the output device.

Except for now the next example, this feature of the ULM will not be used in this lecture. And actually I will not really explain you how it works instead I just want to show you that it is possible to use it for improving the implementation of puts. And by improving the implementation I mean that you can use puts like before without knowing that its implemenation changed.

So here comes this magic implemenation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
extern int64_t
_trap(uint8_t number, void *param);

void
puts(char *str)
{
    char *begin = str;
    while (*str) {
        ++str;
    }
    struct {
        int fd;
        void* buf;
        uint64_t nbytes;
    } params = {1, begin, str - begin};
    _trap(1, &params);
}

The implementation uses the function _trap which is a wrapper for the ULM's trap instruction:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
        .equ    FP,             1
        .equ    SP,             2
        .equ    RET,            3

//------------------------------------------------------------------------------
// Function int64_t _trap(uint8_t number, void *param)
//------------------------------------------------------------------------------
        .equ    ret,            0
        .equ    fp,             8
        .equ    rval,           16

        // function arguments
        .equ    number,         24
        .equ    param,          32

        .globl _trap
# function header of int64_t _trap(uint8_t number, void *params):
_trap:
        // function prologue
        movq    %RET,           ret(%SP)
        movq    %FP,            fp(%SP)
        addq    0,              %SP,                    %FP
        // begin of the function body

        movzbq  number(%FP),    %4
        movq    param(%FP),     %5
        trap    %4,             %5,                     %6
        movq    %6,             rval(%FP)

        // end of the function body
        // function epilogue
        addq    0,              %FP,                    %SP
        movq    fp(%SP),        %FP
        movq    ret(%SP),       %RET
        jmp     %RET,           %0

Here the demo for this example:

theon$ ulmcc -o puts.s puts.c
theon$ ulmas -o puts.o puts.s
theon$ ulmas -o hello.o hello.s
theon$ ulmas -o crt0.o crt0.s
theon$ ulmas -o _trap.o _trap.s
theon$ ulmld -o hello hello.o _trap.o puts.o crt0.o
theon$ ulm hello
hello, world!
theon$