======================== Using the ULM C Compiler ======================== Before makefiles for building libraries and executables you first should learn how to use the ULM C compiler manually. You will also see here some examples to exemplify that C code is just describing assembly code that the compiler generates for you. That's nice because it saves typing and avoids typical errors that often occur when you manually write assembly code. And it's nice because you have the potential that assembly code for every sane architecture can be generated. But you should not expect that C is doing the thinking for you. The language is not designed for incompetent programmers that don't care about details and just wanna mindlessly type together some code that hopefully works. Again, writing C code is writing assembly code. Generating assembly code ======================== The ULM C compiler merely translates C code into assembly, i.e. it acts like `gcc` with option `-S`. For example, the C code ---- CODE (file=session12/func/hello.c) ---------------------------------------- extern void puts(char *str); int main() { puts("hello, world!\n"); } -------------------------------------------------------------------------------- gets translated with ---- SHELL (path=session12/libulm) --------------------------------------------- ulmcc -o xhello_in_c.s xhello_in_c.c -------------------------------------------------------------------------------- into :import: session12/libulm/xhello_in_c.s ---- SHELL (path=session12/libulm, hide) --------------------------------------- rm -f xhello_in_c.s -------------------------------------------------------------------------------- Actually we could have skipped the "`-o xhello_in_c.s`" here as by default the output filename is the base name of the input file with extension ".s". Creating a minimalistic executable ================================== For creating an executable you need to use the assembler and the linker. In __Session 10.8__ you already saw the most minimalistic C program ---- CODE (file=session12/func/main_no_return.c) ------------------------------- int main() { } -------------------------------------------------------------------------------- Recall that the `main` function returns zero if you don't leave the function with an explict return statement. The ULM C compiler translates that source code into ---- SHELL (path=session12/func) ----------------------------------------------- ulmcc -o main_no_return.s main_no_return.c cat main_no_return.s -------------------------------------------------------------------------------- For an executable we have to translate that assembly code into machine code with the ULM assembler and link it with a `_start` function that initializes the stack, calls `main` and halt the ULM using the return value as exit code. That is obviously the `crt0.s` code from __Session 11__ (also recall quiz13): ---- CODE (file=session12/func/crt0.s, fold) ----------------------------------- .equ FP, 1 .equ SP, 2 .equ RET, 3 //------------------------------------------------------------------------------ // Function _start() //------------------------------------------------------------------------------ .equ ret, 0 .equ fp, 8 .equ rval, 16 .text .globl _start _start: // begin of the function body ldzwq 0, %SP // call function main() subq 24, %SP, %SP ldzwq main, %4 jmp %4, %RET movzlq rval(%SP), %4 addq 24, %SP, %SP halt %4 -------------------------------------------------------------------------------- Because the ULM C compiler uses exactly the same calling conventions that were specified in __Session 10.8__ we simply can translate and link `crt0.s` and `main_no_return.s`: ---- SHELL (path=session12/func) ----------------------------------------------- ulmas -o main_no_return.o main_no_return.s ulmas -o crt0.o crt0.s ulmld -o main_no_return main_no_return.o crt0.o -------------------------------------------------------------------------------- Running the executable we can check the exit code with `echo $?`: ---- SHELL (path=session12/func) ----------------------------------------------- ulm main_no_return; echo $? -------------------------------------------------------------------------------- Another minimalistic example ============================ Just for fun let's return some non-zero value from `main`: ---- CODE (file=session12/func/main_returns_42.c) ------------------------------ int main() { return 42; } -------------------------------------------------------------------------------- Now the complete build process is as follows: ---- SHELL (path=session12/func) ----------------------------------------------- ulmcc -o main_returns_42.s main_returns_42.c ulmas -o main_returns_42.o main_returns_42.s ulmas -o crt0.o crt0.s ulmld -o main_returns_42 main_returns_42.o crt0.o -------------------------------------------------------------------------------- And expected we get the exit code 42 after the program terminated: ---- SHELL (path=session12/func) ----------------------------------------------- ulm main_returns_42; echo $? -------------------------------------------------------------------------------- Say hello with ULM C ==================== The remaining examples on this page will be "hello, world" programs implemented in C. All varaints will use a `puts` function for printing the string. Initiallly we will use `puts` from __Session 11.2__ implemented in assembly: ---- CODE (file=session12/func/puts.s, fold) ----------------------------------- .equ FP, 1 .equ SP, 2 .equ RET, 3 //------------------------------------------------------------------------------ // Procedure puts(str) //------------------------------------------------------------------------------ .equ ret, 0 .equ fp, 8 // procedure arguments .equ str, 16 .text .globl puts puts: // function prologue movq %RET, ret(%SP) movq %FP, fp(%SP) addq 0, %SP, %FP // reserve space for 0 local variables. subq 0, %SP, %SP // begin of the function body /* if (*str == 0) goto puts.leave; */ puts.while movq str(%FP), %4 movzbq (%4), %4 subq 0, %4, %0 jz puts.while.done /* putchar(*str); */ movq str(%FP), %4 movzbq (%4), %4 putc %4 /* str = str + 1; */ movq str(%FP), %4 addq 1, %4, %4 movq %4, str(%FP) /* goto puts.while; */ jmp puts.while puts.while.done: // end of the function body // function epilogue puts.leave: addq 0, %FP, %SP movq fp(%SP), %FP movq ret(%SP), %RET jmp %RET, %0 -------------------------------------------------------------------------------- The program will be explained in a bit more detail below: :import: session12/func/hello.c If a function or procedure is not defined within a source file the compiler needs to know how to call it. The lines ---- CODE (type=c) ------------------------------------------------------------- extern void puts(char *str); -------------------------------------------------------------------------------- do not generate any code. They just tell the compiler (or think of it as a programmer's promise given to the compiler) that a function `puts` will be available after linkage. More precisely it tells that - `puts` is a procedure, i.e. it does not return a value, and - `puts` expects exactly one argument `arg` which has a size of 8 bytes (and its value is the address of a byte). From Session 10 you know that this information is needed for properly preparing the stack before the jump to `puts` gets done. So let's use the compiler to do that with ---- SHELL (path=session12/func) ----------------------------------------------- ulmcc -o hello.s hello.c -------------------------------------------------------------------------------- Look at the generated assembly code to confirm that the compiler generates code from the C source file that you easily could have written yourself (in particular look at how `puts` gets called): :import: session12/func/hello.s [fold] Having the assembly code generated by the C compiler we now can generate the executable as before, i.e. use the assembler for generating object files and then link it: ---- SHELL (path=session12/func) ----------------------------------------------- ulmas -o hello.o hello.s ulmas -o crt0.o crt0.s ulmas -o puts.o puts.s ulmld -o hello hello.o puts.o crt0.o -------------------------------------------------------------------------------- And of course check that it is working: ---- SHELL (path=session12/func) ----------------------------------------------- ulm hello -------------------------------------------------------------------------------- With great power comes great responsibility =========================================== One fundamental principal of C (and C++) is _trust the programmer_. If you deserve this trust then the language is a powerful tool. Otherwise it is insane to use this gun! Here comes an example that shows how the compiler trusts you. In this variant it is promised that `puts` is a function that returns an integer: :import: session12/func/hello2.c The compiler does not know what implementation the linker later adds for resolving the symbol `puts`. So of course you can not expect any error or warning: ---- SHELL (path=session12/func) ----------------------------------------------- ulmcc -o hello2.s hello2.c -------------------------------------------------------------------------------- The compiler just generates generates the proper code for calling that function: :import: session12/func/hello.s [fold] Note the difference to before. The assembly code now calls a function so if you don't adapt the implementation of `puts` then things will go wrong: ---- SHELL (path=session12/func) ----------------------------------------------- diff hello.s hello2.s -------------------------------------------------------------------------------- Now let's use this code together with the unmodified procedure `puts`. There is no way that the assembler or linker can check that this will go wrong, so there is no warning or error: ---- SHELL (path=session12/func) ----------------------------------------------- ulmas -o hello2.o hello2.s ulmas -o crt0.o crt0.s ulmas -o puts.o puts.s ulmld -o hello2 hello2.o puts.o crt0.o -------------------------------------------------------------------------------- Because the programmer in this case neither deserved respect nor trust the program is not working as expected: ---- SHELL (path=session12/func) ----------------------------------------------- ulm hello2 -------------------------------------------------------------------------------- Some (not so) fun fact ====================== The __C standard library__ actually defines a __function `puts`__ where the correct prototype is ---- CODE (type=c) ------------------------------------------------------------- extern int puts(char *str); -------------------------------------------------------------------------------- So don't get confused, the important thing to learn here is that the declaration (the prototype for the function) and the definition (the implementation of the function) have to match. And that the programmer has to make sure that they match. If you can not guarantee that things will go wrong and you can not expect any help from the compiler, assembler or linker. To some degree using so called header files will help here. But that's a different story that involves the C preprocessor. Btw, I really hope that I live long enough to see that the C preprocessor one day becomes obsolete in C++. Using C for writing a `puts` implementation =========================================== It's a good idea to implement functions in C when possible and only in assembly if necessary. That's because the code then does not depend on the specific platform. And if the compiler is able to do some reasonable optimizations the generated assembly code is usual efficient enough that writing the assembly code manually provides no advantage. The following C code requires that a procedure is available for printing a single character: ---- CODE (file=session12/func/puts.c) ----------------------------------------- extern void putchar(char ch); void puts(char *str) { while (*str) { putchar(*str); ++str; } } -------------------------------------------------------------------------------- This procedure `putchar` is platform dependent and you have to provide it therefore in handwritten assembly: ---- CODE (file=session12/func/putchar.s, fold) -------------------------------- .equ FP, 1 .equ SP, 2 .equ RET, 3 //------------------------------------------------------------------------------ // Procedure putchar(ch) //------------------------------------------------------------------------------ .equ ret, 0 .equ fp, 8 // procedure arguments .equ ch, 16 .text .globl putchar putchar: // function prologue movq %RET, ret(%SP) movq %FP, fp(%SP) addq 0, %SP, %FP // reserve space for 0 local variables. subq 0, %SP, %SP // begin of the function body movzbq ch(%FP), %4 putc %4 // end of the function body // function epilogue putchar.leave: addq 0, %FP, %SP movq fp(%SP), %FP movq ret(%SP), %RET jmp %RET, %0 -------------------------------------------------------------------------------- Now generate assembly code from the C source files ---- SHELL (path=session12/func) ----------------------------------------------- ulmcc -o hello.s hello.c ulmcc -o puts.s puts.c -------------------------------------------------------------------------------- and look at the assembly code generated from `puts.c`: :import: session12/func/puts.s [fold] For most parts it is essentially to our handwritten assembly code. However, there is one significant difference. For printing a single character we always have the call of `putchar` instead of directly using a single `putc` instruction: - Calling `putchar` requires 7 instructions, and - the definition of `putchar` has 10 instructions. That's a huge overhead and the reason why compiler should provide optimizations like _function inlining_ and a C programmer should know how to make use of them. For now we postpone this topic and just belief such issues can be taken care of later. Having automatically generated assembly code we generate the executable as usual: ---- SHELL (path=session12/func) ----------------------------------------------- ulmas -o hello.o hello.s ulmas -o crt0.o crt0.s ulmas -o puts.o puts.s ulmas -o putchar.o putchar.s ulmld -o hello hello.o putchar.o puts.o crt0.o ulm hello -------------------------------------------------------------------------------- Let's hide things in a software layer ===================================== The ULM has some feature that was kept secret so far, some kind of a firmware. This firmware for instance allows so copy a memory block character by charater to the output device. And this bypasses the CPU so for prining a string we don't have to copy a byte first from memory into a register and from there to the output device. Except for now the next example, this feature of the ULM will not be used in this lecture. And actually I will not really explain you how it works instead I just want to show you that it is possible to use it for improving the implementation of `puts`. And by improving the implementation I mean that you can use `puts` like before without knowing that its implemenation changed. So here comes this magic implemenation: ---- CODE (file=session12/func/puts.c) ----------------------------------------- extern int64_t _trap(uint8_t number, void *param); void puts(char *str) { char *begin = str; while (*str) { ++str; } struct { int fd; void* buf; uint64_t nbytes; } params = {1, begin, str - begin}; _trap(1, ¶ms); } -------------------------------------------------------------------------------- The implementation uses the function `_trap` which is a wrapper for the ULM's __trap__ instruction: ---- CODE (file=session12/func/_trap.s, fold) ---------------------------------- .equ FP, 1 .equ SP, 2 .equ RET, 3 //------------------------------------------------------------------------------ // Function int64_t _trap(uint8_t number, void *param) //------------------------------------------------------------------------------ .equ ret, 0 .equ fp, 8 .equ rval, 16 // function arguments .equ number, 24 .equ param, 32 .globl _trap # function header of int64_t _trap(uint8_t number, void *params): _trap: // function prologue movq %RET, ret(%SP) movq %FP, fp(%SP) addq 0, %SP, %FP // begin of the function body movzbq number(%FP), %4 movq param(%FP), %5 trap %4, %5, %6 movq %6, rval(%FP) // end of the function body // function epilogue addq 0, %FP, %SP movq fp(%SP), %FP movq ret(%SP), %RET jmp %RET, %0 -------------------------------------------------------------------------------- Here the demo for this example: ---- SHELL (path=session12/func) ----------------------------------------------- ulmcc -o puts.s puts.c ulmas -o puts.o puts.s ulmas -o hello.o hello.s ulmas -o crt0.o crt0.s ulmas -o _trap.o _trap.s ulmld -o hello hello.o _trap.o puts.o crt0.o ulm hello -------------------------------------------------------------------------------- :links: Session 10.8 -> doc:session10/page09 Session 11.2 -> doc:session11/page03 Session 11 -> doc:session11/page01 trap -> http://www.mathematik.uni-ulm.de/numerik/hpc/ss20/hpc0/ulm.pdf#page=49 C standard library -> https://en.wikipedia.org/wiki/C_standard_library function `puts`-> https://pubs.opengroup.org/onlinepubs/009695399/functions/puts.html