Some standard library for the ULM

Besides its core language a programming language typically also defines a standard library, in the case of C the C standard library. For the ULM C dialect we will rebuild some parts of the C standard library.

Currently the ULM C compiler only translates source code into assembly code for the ULM instruction set. Use your imagination that it also could produce assembly code for other architectures, e.g. the Intel Architecture or ARM architecture. That means code written for the ULM C compiler is platform independent in the sense that it just needs to be re-compiled on each platform. So obviously we want to write as much as possible in C and as few as possible in assembly. Porting the library to another platform then only requires to adapt the assembly fraction.

Initial code base for the library used in this session

On theon the directory /home/numerik/pub/libulm_initial/ contains all sources files for the library, some test programs and a makefile:

theon$ pwd
/home/numerik/pub/libulm_initial
theon$ ls
crt0.s
Makefile
putchar.s
puts.c
putui.s
xanswer.s
xhello_in_c_gcc
xhello_in_c.c
xhello.s
theon$ 

The makefile is a derivate from Session 11. It allows that source files are written in C or assembly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
Lib         := libulm.a

CC := ulmcc
AS := ulmas
LD := ulmld
LDFLAGS := $(Lib)
RANLIB := ulmranlib

# files of form x*.s or x*.c are test programs and an executable x* gets
# created.
TestTargets := $(patsubst %.s,%,$(wildcard x*.s)) \
               $(patsubst %.c,%,$(wildcard x*.c))
# machinery to cleanup the directory if test programs were renamed or deleted
XObjRemoves := $(filter-out $(patsubst %,%.o,$(TestTargets)),$(wildcard x*.o))
XTstRemoves := $(patsubst %.o,%,$(XObjRemoves))
XSrcRemoves := $(if $(XObjRemoves),xsrcRemoves)

# all other files of form *.s or *.c are part of the library
LibSources  := $(filter-out x%.s,$(wildcard *.s)) \
               $(filter-out x%.c,$(wildcard *.c))
LibObjects  := $(patsubst %.c,%.o,$(patsubst %.s,%.o,$(LibSources)))

# machinery to cleanup the archive if source files for the library were renamed
# or deleted
LibContent  := $(if $(wildcard $(Lib)),$(shell ar t $(Lib) | grep -v "^__"),)
LibRemoves  := $(filter-out $(LibObjects),$(LibContent))
SrcRemoves  := $(if $(LibRemoves),srcRemoves)
ArDelete    := $(if $(LibRemoves),ar d $(Lib) $(LibRemoves),)

.PHONY: all clean srcRemoves xsrcRemoves

all:    $(TestTargets) $(Lib) $(XSrcRemoves)

clean:
        $(RM) $(TestTargets) *.o $(Lib)

$(TestTargets): % : %.o $(Lib)
        $(LD) -o $@ $^

$(XSrcRemoves) :
        $(RM) $(XObjRemoves) $(XTstRemoves)

%.o : %.c
        $(CC) -o $*.s $^
        $(AS) -o $*.o $*.s
        $(RM) $*.s

# $(Lib)(%) : %
#       $(AR) cr $@ $^

$(SrcRemoves) :
        $(ArDelete)

$(Lib) : $(Lib)($(LibObjects)) $(SrcRemoves)
        $(RANLIB) $(Lib)

As usual a simple make will build everything, i.e. the library and the test programs:

theon$ make
ulmas   -o xanswer.o xanswer.s
ulmas   -o crt0.o crt0.s
ar rv libulm.a crt0.o
ar: creating libulm.a
a - crt0.o
ulmas   -o putchar.o putchar.s
ar rv libulm.a putchar.o
a - putchar.o
ulmas   -o putui.o putui.s
ar rv libulm.a putui.o
a - putui.o
ulmcc -o puts.s puts.c
ulmas -o puts.o puts.s
rm -f puts.s
ar rv libulm.a puts.o
a - puts.o
ulmranlib libulm.a
ulmld -o xanswer xanswer.o libulm.a
ulmas   -o xhello.o xhello.s
ulmld -o xhello xhello.o libulm.a
ulmcc -o xhello_in_c.s xhello_in_c.c
ulmas -o xhello_in_c.o xhello_in_c.s
rm -f xhello_in_c.s
ulmld -o xhello_in_c xhello_in_c.o libulm.a
rm puts.o crt0.o putui.o putchar.o
theon$ 

So for instance a “hello, world” program implemented in C:

theon$ xhello_in_c
hello, world!
theon$ 

And with make clean all generated files are deleted:

theon$ make clean
rm -f xanswer xhello xhello_in_c *.o libulm.a
theon$ 

Storing test programs and source files in a single directory is certainly not the right thing to do when you have a larger project but sufficient in our case. Like in the previous session we use a simple naming convention so that the build system can differentiate between them, i.e. files that begin with 'x' are test programs all other are part of the library.

However, you should get at least some impression what a kind of demands a proper build system should satisfy. First of all, it should be possible to add, delete or rename source files. Also, intermediate files (like object files) should only kept around if they can be used to speedup rebuilding the project.

Source code for the library

Files with extension “.c” or “.s” that do not begin with an “x” are source files for the library implemented in C or assembly respectively:

theon$ ls [^x]*.[cs]
crt0.s
putchar.s
puts.c
putui.s
theon$ 

The dependencies for building the static library can be described as follows:

With make libulm.a the build system only generates or updates the library:

theon$ make libulm.a
ulmas   -o crt0.o crt0.s
ar rv libulm.a crt0.o
ar: creating libulm.a
a - crt0.o
ulmas   -o putchar.o putchar.s
ar rv libulm.a putchar.o
a - putchar.o
ulmas   -o putui.o putui.s
ar rv libulm.a putui.o
a - putui.o
ulmcc -o puts.s puts.c
ulmas -o puts.o puts.s
rm -f puts.s
ar rv libulm.a puts.o
a - puts.o
ulmranlib libulm.a
rm puts.o crt0.o putui.o putchar.o
theon$ 

Note that all object files, and also assembly files generated from C code, are deleted afterwards. That's because they would not speedup the rebuild time. For example, if you would modify puts.c then you always have to regenerated puts.s and puts.o:

theon$ touch puts.c
theon$ make libulm.a
ulmcc -o puts.s puts.c
ulmas -o puts.o puts.s
rm -f puts.s
ar rv libulm.a puts.o
r - puts.o
ulmranlib libulm.a
rm puts.o
theon$ 

So there is no benefit from keeping these intermediate files.

For a quick look here are the library source files viewable in the browser:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
        .equ    FP,             1
        .equ    SP,             2
        .equ    RET,            3

//------------------------------------------------------------------------------
// Function _start()
//------------------------------------------------------------------------------
        .equ    ret,            0
        .equ    fp,             8
        .equ    rval,           16

        .text
        .globl _start
_start:
        // begin of the function body

        ldzwq   0,              %SP

        // call function main()
        subq    24,             %SP,            %SP
        ldzwq   main,           %4
        jmp     %4,             %RET
        movzwq  rval(%SP),      %4
        addq    24,             %SP,            %SP

        halt    %4
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
        .equ    FP,             1
        .equ    SP,             2
        .equ    RET,            3

//------------------------------------------------------------------------------
// Procedure putchar(ch)
//------------------------------------------------------------------------------
        .equ    ret,            0
        .equ    fp,             8

        // procedure arguments
        .equ    ch,             16

        .text
        .globl  putchar
putchar:
        // function prologue
        movq    %RET,           ret(%SP)
        movq    %FP,            fp(%SP)
        addq    0,              %SP,            %FP
        // reserve space for 0 local variables.
        subq    0,              %SP,            %SP
        // begin of the function body

        movzbq  ch(%FP),        %4
        putc    %4

        // end of the function body
        // function epilogue
putchar.leave:
        addq    0,              %FP,            %SP
        movq    fp(%SP),        %FP
        movq    ret(%SP),       %RET
        jmp     %RET,           %0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
extern void
putchar(char ch);

void
puts(char *str)
{
    while (*str) {
        putchar(*str);
        ++str;
    }
}
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
        .equ    FP,             1
        .equ    SP,             2
        .equ    RET,            3

//------------------------------------------------------------------------------
// Procedure putui(n)
//------------------------------------------------------------------------------
        .equ    ret,            0
        .equ    fp,             8

        // procedure arguments
        .equ    n,              16

        // local variables
        .equ    p,              -8
        .equ    buf,            p-22

        .text
        .globl  putui
putui:
        // function prologue
        movq    %RET,           ret(%SP)
        movq    %FP,            fp(%SP)
        addq    0,              %SP,            %FP
        // reserve space for pointer p and array buf with 22 characters
        subq    22,             %SP,            %SP
        // begin of the function body

        /*
                p = buf;
        */
        ldswq   buf,            %4
        addq    %4,             %FP,            %4
        movq    %4,             p(%FP)

        /*
                do {
        */
putui.do:

        /*
                *p = n % 10 + '0';
        */
        movq    n(%FP),         %4
        ldzwq   0,              %5
        divq    10,             %4,             %4
        addq    '0',            %6,             %6
        movq    p(%FP),         %7
        movb    %6,             (%7)

        /*
                ++p;
        */
        movq    p(%FP),         %4
        addq    1,              %4,             %4
        movq    %4,             p(%FP)

        /*
                n /= 10;
        */
        movq    n(%FP),         %4
        divq    10,             %4,             %4
        movq    %4,             n(%FP)

        /*
                } while (n!=0);
        */
        movq    n(%FP),         %4
        subq    0,              %4,             %0
        jnz     putui.do

        /*
                while (p != buf) {
        */
putui.while:
        ldswq   buf,            %4
        addq    %4,             %FP,            %4
        movq    p(%FP),         %5
        subq    %4,             %5,             %0
        jz      putui.while_done

        /*
                --p;
        */
        movq    p(%FP),         %4
        subq    1,              %4,             %4
        movq    %4,             p(%FP)

        /*
                putchar(*p);
        */
        movq    p(%FP),         %4
        movzbq  (%4),           %4
        putc    %4

        jmp     putui.while


putui.while_done:

        // end of the function body
        // function epilogue
putui.leave:
        addq    0,              %FP,            %SP
        movq    fp(%SP),        %FP
        movq    ret(%SP),       %RET
        jmp     %RET,           %0

Test programs

Files that begin with a 'x' refer to a test program that can be written in C or assembly. Hence, these are the initial test programs:

theon$ ls x*.[cs]
xanswer.s
xhello_in_c.c
xhello.s
theon$ 

Each test program gets translated into an object file and linked against the library:

For generating specific test programs pass them as arguments to make. For instance, this just creates xhello_in_c

theon$ make xhello_in_c
ulmcc -o xhello_in_c.s xhello_in_c.c
ulmas -o xhello_in_c.o xhello_in_c.s
rm -f xhello_in_c.s
ulmld -o xhello_in_c xhello_in_c.o libulm.a
theon$ ulm xhello_in_c
hello, world!
theon$ 

Quiz 15: Make puts standard conform ====================================== In this exercise you are supposed to change the implementation of puts so that it conforms to the C standard library, read here the man page of function puts. On success it returns a non-negative value on success and otherwise EOF which is a implementation dependant macro (in most implementation it expands to -1). For getting an idea why puts in general can fail read When will puts() fail? on stackoverflow.

On the ULM calling puts will always succeed so changing the implementation has the sole purpose of being compliant. This is for instanced achieved by this implementation that always returns zero (most implementations in the real world would return the number of printed characters):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
extern void
putchar(char ch);

int
puts(char *str)
{
    while (*str) {
        putchar(*str);
        ++str;
    }
    return 0;
}

So what is it that you have to do? You should get a taste of the consequences when you change the interface of a library: Adapt the test programs xhello.s and xhello_in_c.c so that the work together with the new puts function. On theon submit the source files as follows:

submit hpc quiz15 xhello.s xhello_in_c.c