=========== Supplements [TOC] =========== Here some additional information about related topics. I will also collect question that arise. How the `jmp %CALL, %RET` and `jmp %RET, %0` pattern is related to ARM ====================================================================== The __ARM architecture__ provides the "branch and link" instruction `BL` for calling a function: ---- CODE (type=s) ------------------------------------------------------------- BL function_label -------------------------------------------------------------------------------- What this does is storing the return address in register called `lr` (for _link register_) and then jumps to the address referred to by the label `function_label`. So pretty much the same as the `jmp %CALL, %RET` except that on the ULM you explicitly to specify the return register. This is sometime annoying as you have to make sure that all parties agree on the same return register. But I think for educational purposes a more explicit notation (that requires and enforces more discipline in programming) is ok. Returning from the function is done on ARM with a move-instruction. You simple overwrite the instruction pointer (on ARM it's call `PC` for program counter, on ULM it is `%IP) with the link register. ---- CODE (type=s) ------------------------------------------------------------- MOV pc, lr /* Return from subroutine. Note: The MOV copies the right-hand-side to the left-hand-side */ -------------------------------------------------------------------------------- Note that on the ULM you can note use `%IP` explicitly in an instruction, but `jmp %RET, %0` is implicitly the exact same thing as you would do on ARM. Control structures ================== * If-then-else in an algorithm ---- TIKZ ----------- \begin{adjustbox}{} \textcolor{white}{.} \begin{varwidth}{7cm} \begin{algorithmic} \State $A$ \If{$\text{cond}$} \State $B$ \Else \State $C$ \EndIf \State $D$ \end{algorithmic} \end{varwidth} \end{adjustbox} --------------------- * Flow chart (Variant 1) ---- TIKZ -------------- \begin{tikzpicture} \input{flowchart.tex} \SetMargin{1}{1}{0}{5} \PutStatement{0}{A} \PutJump{1}{$\text{cond}$} \PutStatement{2}{C} \PutJump{3}{} \PutStatement{4}{B} \PutStatement{5}{D} \AddPath{0}{1} \AddPath{1}{2} \AddCondJumpPath{1}{4} \AddPath{2}{3} \AddJumpPathLeft{3}{5} \AddPath{4}{5} \end{tikzpicture} ------------------------ * Flow chart (Variant 2) ---- TIKZ --------------- \begin{tikzpicture} \input{flowchart.tex} \SetMargin{1}{1}{0}{5} \PutStatement{0}{A} \PutJump{1}{$\lnot \text{cond}$} \PutStatement{2}{B} \PutJump{3}{} \PutStatement{4}{C} \PutStatement{5}{D} \AddPath{0}{1} \AddPath{1}{2} \AddCondJumpPath{1}{4} \AddPath{2}{3} \AddJumpPathLeft{3}{5} \AddPath{4}{5} \end{tikzpicture} ------------------------ Loading a function address into `%CALL` ======================================= In general the address of a function `foo` can be larger than $2^{16}$. Then using ---- CODE (type=s) ------------------------------------------------------------- ldzwq foo, %CALL -------------------------------------------------------------------------------- will fail. You can simulate this by trying to load a literal value that is larger than $2^{16}$, e.g. 0x12345, into a register: ---- CODE (file=session09/load/ldzwq_fail.s) ----------------------------------- ldzwq 0x12345, %1 -------------------------------------------------------------------------------- You will get the following error from the code generator: ---- SHELL (path=session09/load) ----------------------------------------------- ulmas ldzwq_fail.s -------------------------------------------------------------------------------- So how to solve this? --------------------- Besides __ldzwq__ you need as ingredients the immediate operators `@w0`, .. , `@w3` and the __shldwq__ instruction: - The operator `@w0` picks the least significant word of a literal (or label). So for example `@w0(0x12345)` picks 0x2345. And with the other operators you can pick the other words. In general, if $X$ is some bit pattern then ---- LATEX ------------------------------------------------------------------- \begin{array}{lcl} \text{@w0}(X) & = & u(X) \cdot 2^{-16 \cdot 0} \bmod 2^{16} \\ \text{@w1}(X) & = & u(X) \cdot 2^{-16 \cdot 1} \bmod 2^{16} \\ \text{@w2}(X) & = & u(X) \cdot 2^{-16 \cdot 2} \bmod 2^{16} \\ \text{@w3}(X) & = & u(X) \cdot 2^{-16 \cdot 3} \bmod 2^{16} \\ \end{array} ------------------------------------------------------------------------------ - The `shldwq` (shift left load) instruction shifts the content of a register `%Z` 16 positions to the left and inserts into the least significant bits a 16-bit pattern `XY`: ---- LATEX ------------------------------------------------------------------- u(\%\text{Z}) \cdot 2^{16} + u(\text{XY}) \to u(\%\text{Z}) ------------------------------------------------------------------------------ So for example ---- CODE (file=session09/load/load32.s) ----------------------------------- ldzwq @w1(0x12345), %1 shldwq @w0(0x12345), %1 -------------------------------------------------------------------------------- loads the bit pattern 0x12345 into `%1`. And a 64-bit literal can be loaded like this: ---- CODE (file=session09/load/load64.s) --------------------------------------- ldzwq @w3(0x1234567890123456), %1 shldwq @w2(0x1234567890123456), %1 shldwq @w1(0x1234567890123456), %1 shldwq @w0(0x1234567890123456), %1 -------------------------------------------------------------------------------- When you look at the generated machine code you easily can see what bit pattern where picked form the literal: ---- SHELL (path=session09/load) ----------------------------------------------- ulmas -o load64 load64.s cat load64 -------------------------------------------------------------------------------- General pattern for loading an label (or address) ------------------------------------------------- In general four instructions are needed to load an arbitrary 64 bit address or literal into a register, e.g. ---- CODE (file=session09/load/load_label.s) ----------------------------------- ldzwq @w3(some_label), %1 shldwq @w2(some_label), %1 shldwq @w1(some_label), %1 shldwq @w0(some_label), %1 -------------------------------------------------------------------------------- That the prize to pay for the simplicity and efficiency of a RISC architecture. On a CISC architecture you would just provide an instruction that is encoded with more bytes... On the other hand a RISC architecture is simpler, and that can mean less energy consumption, higher clock rate, etc. And that usually pays off ... (Even the Intel64 architecture just looks for the outside world like a CISC architecture but has internally some RISC core). Function calls on Intel64 ========================= In order to show you that our calling convention is relevant for the real world I just show you an example from the real world. `foo.s` (Callee code) --------------------- The instruction format looks a bit different. For example there is an instruction `addq` but it only takes two register operands. When executed the first register gets added to the second and the second gets overwritten with the result (like on the ULM when you would write `addq %1, %2, %2`). Also the register names are different (and strange). Our `%CALLEE0` is here `%rdi` and `%CALLEE1` is here `%rsi`. And the result of a function gets returned by writing it into `%rax`. And there is also this `.globl` directive that is needed when functions are defined in a separate source files and need to by linked. But don't get confused by the details. Look at this code: :import: session09/gcc/foo.s Can you see similarities to assembly code for the ULM like this: ---- CODE (type=s) ------------------------------------------------------------- .text .globl foo foo: addq %CALLEE0, %CALLEE1, %CALLEE1 # like addq %rdi, %rsi addq %CALLEE1, %0, %CALLEE0 # like movq %rsi, %rax jmp %RET, %0 # like ret -------------------------------------------------------------------------------- You can translate this assembly code into machine code by using `gcc` as a convenient front end for the GNU assembler (otherwise you need to know about several options for using it on `theon`): ---- SHELL (path=session09/gcc) ------------------------------------------------ gcc -S foo.s -------------------------------------------------------------------------------- The generated machine code gets written into (an object) file `foo.o`. You can see that the machine code you can use the program `objdump` like this: ---- SHELL (path=session09/gcc) ------------------------------------------------ objdump -d foo.o -------------------------------------------------------------------------------- Looks kind of familiar! Although there are differences, e.g. that instructions have different sizes. `main.c` (Caller code) ---------------------- Now some code that calls this functions. And this code is described in C as follows (and I say on purpose "described" because we describe what machine code should be generated form that): :import: session09/gcc/main.c Analogously this can be translated into machine code and this time using the GNU C compiler and subsequently the GNU assembler. This tool chain get invoked by ---- SHELL (path=session09/gcc) ------------------------------------------------ gcc -c -O3 main.c -------------------------------------------------------------------------------- With the additional option `-O3` I turned on some optimizations. You can skip that, I was just using that so that the generated machine code is a bit shorter. Again you can look at the generated machine code with `objdump`: ---- SHELL (path=session09/gcc, fold) ------------------------------------------ objdump -d main.o -------------------------------------------------------------------------------- Let's link and run that thing ----------------------------- You can combine (the right expression is to link) these two pieces of machine code in `main.o` and `foo.o` by using `gcc` again as a convenient front-end for the Solaris linker `ld`: ---- SHELL (path=session09/gcc) ------------------------------------------------ gcc main.o foo.o -------------------------------------------------------------------------------- As I mentioned in the introduction video to Session 8 actually more than just these two object files get linked. But let's ignore the details for the moment and look at the generated executable `a.out`: ---- SHELL (path=session09/gcc) ------------------------------------------------ a.out -------------------------------------------------------------------------------- You wanna play? --------------- Replace `addq %rdi, %rsi` with `imulq %rdi, %rsi`. I guess you know what would happen ;-) :links: ldzwq -> http://www.mathematik.uni-ulm.de/numerik/hpc/ss20/hpc0/ulm.pdf#page=28 shldwq -> http://www.mathematik.uni-ulm.de/numerik/hpc/ss20/hpc0/ulm.pdf#page=46 ARM architecture -> https://en.wikipedia.org/wiki/ARM_architecture :navigate: up -> doc:index back -> doc:session09/page03