Assembly: Loading a Literal into a Register
New subq Instruction: Fixing a Problem in print_uint64
Currently the instruction
1 | subq .print_uint64.buf, %p, %0
|
is specified in the instruction set as follows
1 2 3 4 5 6 7 | RRR (OP u 8) (X u 8) (Y u 8) (Z u 8)
# ...
0x05 RRR
: subq X, %Y, %Z
ulm_sub64(X, ulm_regVal(Y), Z);
|
Hence, for the literal .print_uint64.buf only 8 bits are available. Because of that we will run sooner or later into a problem with our current implementation of print_uint64. Because of this code fragment:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | print_uint64
// ...
.bss
.print_uint64.buf:
.space 20
.text
// ...
subq .print_uint64.buf, %p, %0
// ...
ret %RET_ADDR
|
If the size of our program grows the text segment soon will exceed 256 bytes. That means the addresses of the data and BSS segment can no longer be encode with 8 bits.
In order to fix that we need two things:
-
Another instruction for subtraction where we can subtract a register %X from %Y
1 2 3
0x18 RRR : subq %X, %Y, %Z ulm_sub64(ulm_regVal(X), ulm_regVal(Y), Z);
-
And fix the code of print_uint64 so that the literal .print_uint64.buf will be available in some register. Using the @w[0-3] operators we could achieve this as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
print_uint64 .data .equ val, PARAM0 .equ digit, CALLEE1 .equ p, CALLEE2 .equ buf, CALLEE3 .bss .print_uint64.buf: .space 20 .text # load .print_uint64.buf into %p ldzwq @w3(.print_uint64.buf), %p shldwq @w2(.print_uint64.buf), %p shldwq @w1(.print_uint64.buf), %p shldwq @w0(.print_uint64.buf), %p # copy %p to %buf movq %p, %buf # subtract .print_uint64.buf (stored in %buf) from %p subq %buf, %p, %0 // ... ret %RET_ADDR
Using always four instructions (with one ldzwq and three shldwq) is inconvenient in handwritten code. And it often is unnecessary probably our addresses in the text, data and BSS segment will always fit into 16 bits. But hoping that that 16-bit will just is like calling for hit me again.
The clean solution would be to use a literal pool. As it just contains one literal we can simply that bookkeeping a bit:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
print_uint64 .data .equ val, PARAM0 .equ digit, CALLEE1 .equ p, CALLEE2 .equ buf, CALLEE3 .bss .print_uint64.buf: .space 20 .text # load .print_uint64.buf into %p ldpa .print_uint.pool.buf, %p ldfp 0(%p), %p # copy %p to %buf movq %p, %buf # subtract .print_uint64.buf (stored in %buf) from %p subq %buf, %p, %0 // ... ret %RET_ADDR .align 8 .print_uint.pool.buf: .quad .print_uint64.buf
Of course want to support the special case where the displacemane Y in ldfp Y(%X), %Z equals zero in a more convenient way. Like for movq Y(%X), %Z we simply provide in the instruction set an alternative where Y is %skipped:
1 2 3 4 | 0x17 RRR
: ldfp Y(%X), %Z
: ldfp (%X), %Z
ulm_fetch64(Y * 8, X, 0, 0, ULM_ZERO_EXT, 8, Z);
|
Now we can change in function printuint64_ the line
1 | ldfp 0(%p), %p
|
to
1 | ldfp (%p), %p
|
Provided Material
Here an ULM Instruction Set and its isa.txt source code that contains all the instructions shown in the video (and for ldfp the alternative with a zero displacement):
RRR (OP u 8) (X u 8) (Y u 8) (Z u 8)
J26 (OP u 8) (XYZ j 24)
U16R (OP u 8) (XY u 16) (Z u 8)
J18R (OP u 8) (XY j 16) (Z u 8)
0x01 RRR
: halt %X
ulm_halt(ulm_regVal(X));
0x02 RRR
: getc %X
ulm_setReg(ulm_readChar() & 0xFF, X);
0x03 RRR
: putc %X
ulm_printChar(ulm_regVal(X) & 0xFF);
0x04 J26
: jmp XYZ
ulm_unconditionalRelJump(XYZ);
0x05 RRR
: subq X, %Y, %Z
ulm_sub64(X, ulm_regVal(Y), Z);
0x06 J26
: jnz XYZ
: jne XYZ
ulm_conditionalRelJump(ulm_statusReg[ULM_ZF] == 0, XYZ);
0x07 J26
: jz XYZ
: je XYZ
ulm_conditionalRelJump(ulm_statusReg[ULM_ZF] == 1, XYZ);
0x08 U16R
: ldzwq XY, %Z
ulm_setReg(XY, Z);
0x09 RRR
: movzbq (%X), %Z
ulm_fetch64(0, X, 0, 0, ULM_ZERO_EXT, 1, Z);
0x0A RRR
: addq X, %Y, %Z
ulm_add64(X, ulm_regVal(Y), Z);
0x0B RRR
: imulq %X, %Y, %Z
ulm_mul64(ulm_regVal(X), ulm_regVal(Y), Z);
0x0C J26
: ja XYZ
ulm_conditionalRelJump(ulm_statusReg[ULM_ZF] == 0
&& ulm_statusReg[ULM_CF] == 0, XYZ);
0x0D J26
: jb XYZ
ulm_conditionalRelJump(ulm_statusReg[ULM_CF] == 1, XYZ);
0x0E RRR
: addq %X, %Y, %Z
: movq %X, %Z
ulm_add64(ulm_regVal(X), ulm_regVal(Y), Z);
0x0F RRR
: imulq X, %Y, %Z
ulm_mul64(X, ulm_regVal(Y), Z);
0x10 RRR
# Uses the 128-bit unsigned integer divider. The 64-bits stored in %Y are
# therefore zero extended with %0 to 128 bits and then used as numerator.
# The immediate value X is used as unsigned denominator.
#
# The result of the operation is the quotient and the remainder. These results
# will be stored in a register pair: The quotient is stored in %Z and the
# remainder in %{Z + 1}.
: divq X, %Y, %Z
ulm_div128(X, ulm_regVal(Y), ulm_regVal(0), Z, 0, Z + 1);
0x11 RRR
# The least significant byte in %X gets stored at address %Z.
: movb %X, (%Z)
ulm_store64(0, Z, 0, 0, 1, X);
0x12 RRR
# Fetches a quad word from address %X into register %Z
: movq Y(%X), %Z
: movq (%X), %Z
ulm_fetch64(Y, X, 0, 0, ULM_ZERO_EXT, 8, Z);
0x13 RRR
: putc X
ulm_printChar(X & 0xFF);
# void ulm_absJump(uint64_t destAddr, ulm_Reg ret);
0x14 RRR
: jmp %X, %Y
: call %X, %Y
: ret %X
ulm_absJump(ulm_regVal(X), Y);
0x15 U16R
: shldwq XY, %Z
ulm_setReg(ulm_regVal(Z) << 16 | XY, Z);
0x16 J18R
: ldpa XY, %Z
ulm_setReg(ulm_ipVal() + XY, Z);
0x17 RRR
: ldfp Y(%X), %Z
: ldfp (%X), %Z
ulm_fetch64(Y * 8, X, 0, 0, ULM_ZERO_EXT, 8, Z);
0x18 RRR
: subq %X, %Y, %Z
ulm_sub64(ulm_regVal(X), ulm_regVal(Y), Z);
# General meaning of the mnemonics
@addq
# Integer addition.
@getc
# Get character
@divq
# 64-bit unsigned integer division
@halt
# Halt program
@imulq
# 64-bit unsigned and signed integer multiplication.
@ja
# Jump if above (conditional jump).
@jb
# Jump if below (conditional jump).
@je
# Jump if equal (conditional jump).
@jmp
# Jump (unconditional jump).
@jne
# Jump if not equal (conditional jump).
@jnz
# Jump if not zero (conditional jump).
@jz
# Jump if zero (conditional jump).
@ldzwq
# Load zero extended word to quad word register.
@putc
# Put character
@subq
# Integer subtraction.
@movb
# Move a byte from a register to memory (store instruction).
@movzbq
# Move zero extended byte to quad word register (fetch instruction).
@movq
# Move quad word
@shldwq
# Shift (destination register) and load word
@ldpa
# Load Pool Address
@ldfp
# Load from Pool
Quiz 13: Computing the greatest common divisor
Write a program gcd.s that computes for two 64-bit unsigned integers the greatest common divisor (gcd). Provide the user a nice experience, i.e. using the program looks like this:
1 2 3 4 5 6 7 8 | theon$ 1_ulm_build/load64/ulm gcd
a = 18
b = 12
gcd(18, 12) = 6
theon$ 1_ulm_build/load64/ulm gcd
a = 350982
b = 822647
gcd(350982, 822647) = 527
|
Use the following algorithm for your implementation:
All labels are in general 64-bit literals. If you require a 64-bit literal as an absolute address use a literal pool. So in particular also fix the problem in print_uint64 as outlined above.
Your program should have an exit code of 0.
Submit your program with
1 | submit hpc quiz13 isa.txt gcd.s
|