Assembly: Loading a Literal into a Register

New `subq` Instruction: Fixing a Problem in `print_uint64`

Currently the instruction

1	subq .print_uint64.buf, %p, %0

is specified in the instruction set as follows

RRR     (OP u 8) (X u 8) (Y u 8) (Z u 8)

# ...

0x05    RRR
:   subq    X, %Y, %Z
    ulm_sub64(X, ulm_regVal(Y), Z);

Hence, for the literal .print_uint64.buf only 8 bits are available. Because of that we will run sooner or later into a problem with our current implementation of print_uint64. Because of this code fragment:

print_uint64

// ...

        .bss
.print_uint64.buf:
        .space  20

        .text

// ...

        subq    .print_uint64.buf,      %p,     %0

// ...

        ret     %RET_ADDR

If the size of our program grows the text segment soon will exceed 256 bytes. That means the addresses of the data and BSS segment can no longer be encode with 8 bits.

In order to fix that we need two things:

Another instruction for subtraction where we can subtract a register %X from %Y

  0x18    RRR
:   subq    %X, %Y, %Z
    ulm_sub64(ulm_regVal(X), ulm_regVal(Y), Z);

And fix the code of print_uint64 so that the literal .print_uint64.buf will be available in some register. Using the @w[0-3] operators we could achieve this as follows:

print_uint64
        .data
        .equ    val,    PARAM0
        .equ    digit,  CALLEE1
        .equ    p,      CALLEE2
        .equ    buf,    CALLEE3

        .bss
.print_uint64.buf:
        .space  20

        .text

        # load .print_uint64.buf into %p
        ldzwq   @w3(.print_uint64.buf), %p
        shldwq  @w2(.print_uint64.buf), %p
        shldwq  @w1(.print_uint64.buf), %p
        shldwq  @w0(.print_uint64.buf), %p

        # copy %p to %buf
        movq    %p,     %buf

        # subtract .print_uint64.buf (stored in %buf) from %p
        subq    %buf,   %p,     %0

        // ...

        ret     %RET_ADDR

Using always four instructions (with one ldzwq and three shldwq) is inconvenient in handwritten code. And it often is unnecessary probably our addresses in the text, data and BSS segment will always fit into 16 bits. But hoping that that 16-bit will just is like calling for hit me again.

The clean solution would be to use a literal pool. As it just contains one literal we can simply that bookkeeping a bit:

print_uint64
        .data
        .equ    val,    PARAM0
        .equ    digit,  CALLEE1
        .equ    p,      CALLEE2
        .equ    buf,    CALLEE3

        .bss
.print_uint64.buf:
        .space  20

        .text

        # load .print_uint64.buf into %p
        ldpa    .print_uint.pool.buf, %p
        ldfp    0(%p),                %p

        # copy %p to %buf
        movq    %p,     %buf

        # subtract .print_uint64.buf (stored in %buf) from %p
        subq    %buf,   %p,     %0

        // ...

        ret     %RET_ADDR

        .align  8
.print_uint.pool.buf:
        .quad   .print_uint64.buf

Of course want to support the special case where the displacemane Y in ldfp Y(%X), %Z equals zero in a more convenient way. Like for movq Y(%X), %Z we simply provide in the instruction set an alternative where Y is %skipped:

0x17    RRR
:   ldfp    Y(%X), %Z
:   ldfp    (%X), %Z
    ulm_fetch64(Y * 8, X, 0, 0, ULM_ZERO_EXT, 8, Z);

Now we can change in function printuint64_ the line

1	ldfp 0(%p), %p

1	ldfp (%p), %p

Provided Material

Here an ULM Instruction Set and its isa.txt source code that contains all the instructions shown in the video (and for ldfp the alternative with a zero displacement):

session13/load64/0_ulm_variants/load64/isa.txt

RRR     (OP u 8) (X u 8) (Y u 8) (Z u 8)
J26     (OP u 8) (XYZ j 24)
U16R    (OP u 8) (XY u 16) (Z u 8)
J18R    (OP u 8) (XY j 16) (Z u 8)


0x01    RRR
:   halt    %X
    ulm_halt(ulm_regVal(X));

0x02    RRR
:   getc    %X
    ulm_setReg(ulm_readChar() & 0xFF, X);

0x03    RRR
:   putc    %X
    ulm_printChar(ulm_regVal(X) & 0xFF);

0x04    J26
:   jmp XYZ
    ulm_unconditionalRelJump(XYZ);

0x05    RRR
:   subq    X, %Y, %Z
    ulm_sub64(X, ulm_regVal(Y), Z);

0x06    J26
:   jnz XYZ
:   jne XYZ
    ulm_conditionalRelJump(ulm_statusReg[ULM_ZF] == 0, XYZ);

0x07    J26
:   jz  XYZ
:   je  XYZ
    ulm_conditionalRelJump(ulm_statusReg[ULM_ZF] == 1, XYZ);

0x08    U16R
:   ldzwq   XY, %Z
    ulm_setReg(XY, Z);

0x09    RRR
:   movzbq (%X), %Z
    ulm_fetch64(0, X, 0, 0, ULM_ZERO_EXT, 1, Z);

0x0A    RRR
:   addq    X, %Y, %Z
    ulm_add64(X, ulm_regVal(Y), Z);

0x0B    RRR
:   imulq   %X, %Y, %Z
    ulm_mul64(ulm_regVal(X), ulm_regVal(Y), Z);

0x0C    J26
:   ja      XYZ
    ulm_conditionalRelJump(ulm_statusReg[ULM_ZF] == 0
                        && ulm_statusReg[ULM_CF] == 0, XYZ);

0x0D    J26
:   jb      XYZ
    ulm_conditionalRelJump(ulm_statusReg[ULM_CF] == 1, XYZ);

0x0E    RRR
:   addq    %X, %Y, %Z
:   movq    %X, %Z
    ulm_add64(ulm_regVal(X), ulm_regVal(Y), Z);

0x0F    RRR
:   imulq   X, %Y, %Z
    ulm_mul64(X, ulm_regVal(Y), Z);

0x10    RRR
#   Uses the 128-bit unsigned integer divider. The 64-bits stored in %Y are
#   therefore zero extended with %0 to 128 bits and then used as numerator.
#   The immediate value X is used as unsigned denominator.
#
#   The result of the operation is the quotient and the remainder. These results
#   will be stored in a register pair: The quotient is stored in %Z and the
#   remainder in %{Z + 1}.
:   divq    X, %Y, %Z
    ulm_div128(X, ulm_regVal(Y), ulm_regVal(0), Z, 0, Z + 1);

0x11    RRR
#   The least significant byte in %X gets stored at address %Z.
:   movb    %X, (%Z)
    ulm_store64(0, Z, 0, 0, 1, X);

0x12    RRR
#   Fetches a quad word from address %X into register %Z
:   movq    Y(%X), %Z
:   movq    (%X), %Z
    ulm_fetch64(Y, X, 0, 0, ULM_ZERO_EXT, 8, Z);

0x13    RRR
:   putc    X
    ulm_printChar(X & 0xFF);

#   void ulm_absJump(uint64_t destAddr, ulm_Reg ret);
0x14    RRR
:   jmp     %X,     %Y
:   call    %X,     %Y
:   ret     %X
    ulm_absJump(ulm_regVal(X), Y);

0x15    U16R
:   shldwq  XY, %Z
    ulm_setReg(ulm_regVal(Z) << 16 | XY, Z);

0x16    J18R
:   ldpa    XY, %Z
    ulm_setReg(ulm_ipVal() + XY, Z);

0x17    RRR
:   ldfp    Y(%X), %Z
:   ldfp    (%X), %Z
    ulm_fetch64(Y * 8, X, 0, 0, ULM_ZERO_EXT, 8, Z);

0x18    RRR
:   subq    %X, %Y, %Z
    ulm_sub64(ulm_regVal(X), ulm_regVal(Y), Z);


# General meaning of the mnemonics

@addq
#   Integer addition.
@getc
#   Get character
@divq
#   64-bit unsigned integer division
@halt
#   Halt program
@imulq
#   64-bit unsigned and signed integer multiplication.
@ja
#   Jump if above (conditional jump).
@jb
#   Jump if below (conditional jump).
@je
#   Jump if equal (conditional jump).
@jmp
#   Jump (unconditional jump).
@jne
#   Jump if not equal (conditional jump).
@jnz
#   Jump if not zero (conditional jump).
@jz
#   Jump if zero (conditional jump).
@ldzwq
#   Load zero extended word to quad word register.
@putc
#   Put character
@subq
#   Integer subtraction.
@movb
#   Move a byte from a register to memory (store instruction).
@movzbq
#   Move zero extended byte to quad word register (fetch instruction).
@movq
#   Move quad word
@shldwq
#   Shift (destination register) and load word
@ldpa
#   Load Pool Address
@ldfp
#   Load from Pool

Quiz 13: Computing the greatest common divisor

Write a program gcd.s that computes for two 64-bit unsigned integers the greatest common divisor (gcd). Provide the user a nice experience, i.e. using the program looks like this:

theon$ 1_ulm_build/load64/ulm gcd
a = 18
b = 12
gcd(18, 12) = 6
theon$ 1_ulm_build/load64/ulm gcd
a = 350982
b = 822647
gcd(350982, 822647) = 527

Use the following algorithm for your implementation:

All labels are in general 64-bit literals. If you require a 64-bit literal as an absolute address use a literal pool. So in particular also fix the problem in print_uint64 as outlined above.

Your program should have an exit code of 0.

Submit your program with

1	submit hpc quiz13 isa.txt gcd.s

Assembly: Loading a Literal into a Register

New subq Instruction: Fixing a Problem in print_uint64

Provided Material

Quiz 13: Computing the greatest common divisor

New `subq` Instruction: Fixing a Problem in `print_uint64`