Alignment of data

Fetch and store operations on the ULM require that the address of the data is a multiple of the data size. That means, a quad word (8 bytes) can only be fetched from an address that is a multiple of 8 or stored to an address that is a multiple of 8. Analogously fetching or storing a long word (4 bytes) requires that the address is a multiple of 4, and for a word (2 bytes) a multiple of 2. Only a single byte can be fetched from or stored to an arbitrary address.

Example for a bus error (bad alignment)

Just try to run the following mini program, consisting of a instructions that store data at address \(0x24\) (or \(36\) in decimal):

1
2
3
4
5
    ldzwq   0x24,   %1
    movb    %0,     (%1)
    movw    %0,     (%1)
    movl    %0,     (%1)
    movq    %0,     (%1)

You don't need a halt instruction here because it is sure to crash when the the last instruction (the movq) gets executed:

theon$ ulmas -o bus_error bus_error.s
theon$ ulm bus_error
ulm: execution aborted
[0x0000000000000024] bad alignment, must be a multiple of 8
[0x0000000000000010] movq %0, 0(%1)
theon$ 

Storing a byte always works for any address, so it is clear that the movb will not crash. Storing a word requires an even address, which is the case here

Storing a long word is no problem because the address is also aligned to 4 bytes:

So what happened is that the third instruction tried to store a quad word at address 36 (or 0x24 in hex):

Why is there such a handicap?

It is easier to build the hardware if you have this restriction. And if you want or need, you still can fetch/store any data size from any address, you just need for an unaligned access more then one instruction.

Why did the movq instructions work in the stack example?

The stack pointer was initialized with 0 (which is a multiple of 8) and only decremented or incremented by 8. So the address \(u\left(\%SP\right)\) was always aligned to 8 bytes.

Using aligned data

The directives .word, .long and .quad generate in general additional bytes for padding so that the actual data will be aligned when a program is loaded. Following example

1
2
3
4
5
6
7
    .data
    .byte   1
    .quad   2
    .byte   3
    .long   4
    .byte   5
    .word   6

generates the following memory layout

Translate the above code with

theon$ ulmas -o data data.s
theon$ 

and look at the assembler output:

#DATA 8
0x0000000000000000: 01 #        .byte 1
0x0000000000000001: 00 00 00 00 00 00 00 # padding for alignment
0x0000000000000008: 00 00 00 00 00 00 00 02 #   .quad 2
0x0000000000000010: 03 #        .byte 3
0x0000000000000011: 00 00 00 # padding for alignment
0x0000000000000014: 00 00 00 04 #       .long 4
0x0000000000000018: 05 #        .byte 5
0x0000000000000019: 00 # padding for alignment
0x000000000000001A: 00 06 #     .word 6

Pitfalls when using labels for aligned data

Although the data generated by the directive is aligned using labels like this will not have the effect you might expect:

1
2
3
4
5
6
7
    .data
a   .byte   1
b   .quad   2
c   .byte   3
d   .long   4
e   .byte   5
f   .word   6

Recall, labels refer to the address of the first byte generated for the next instruction or pseudo operation. If a directive requires than this would be the address of a padding byte. Hence, the labels refer to address as illustrated here:

Translate the above code with

theon$ ulmas -o data_label_fail data_label_fail.s
theon$ 

and look at the assembler output:

#DATA 8
0x0000000000000000: 01 #        .byte 1
0x0000000000000001: 00 00 00 00 00 00 00 # padding for alignment
0x0000000000000008: 00 00 00 00 00 00 00 02 #   .quad 2
0x0000000000000010: 03 #        .byte 3
0x0000000000000011: 00 00 00 # padding for alignment
0x0000000000000014: 00 00 00 04 #       .long 4
0x0000000000000018: 05 #        .byte 5
0x0000000000000019: 00 # padding for alignment
0x000000000000001A: 00 06 #     .word 6
#SYMTAB
d a                           0x0000000000000000
d b                           0x0000000000000001
d c                           0x0000000000000010
d d                           0x0000000000000011
d e                           0x0000000000000018
d f                           0x0000000000000019

Look at the entries of the symbol table to see what addresses the assembler associates with the label. Conform that this is equivalent to what you saw in the above sketch of the memory layout.

Using the .align directive

The .align directive instructs the assembler to generate padding bytes until the next address has a specified alignment. This can be used such that data directives don't have to implicitly generate such padding bytes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    .data
a   .byte   1

    .align  8       # generate padding bytes until next address is aligned to 8
b   .quad   2

c   .byte   3

    .align  4       # generate padding bytes until next address is aligned to 4
d   .long   4

e   .byte   5

    .align  2       # generate padding bytes until next address is aligned to 2
f   .word   6

Then the labels have the desired effect to refer to the address of the actual data and not to address of a padding byte:

Translate the above code with

theon$ ulmas -o data_align_label data_align_label.s
theon$ 

and look at the assembler output:

#DATA 8
0x0000000000000000: 01 #        .byte 1
0x0000000000000001: 00 00 00 00 00 00 00 # padding for alignment
0x0000000000000008: 00 00 00 00 00 00 00 02 #   .quad 2
0x0000000000000010: 03 #        .byte 3
0x0000000000000011: 00 00 00 # padding for alignment
0x0000000000000014: 00 00 00 04 #       .long 4
0x0000000000000018: 05 #        .byte 5
0x0000000000000019: 00 # padding for alignment
0x000000000000001A: 00 06 #     .word 6
#SYMTAB
d a                           0x0000000000000000
d b                           0x0000000000000008
d c                           0x0000000000000010
d d                           0x0000000000000014
d e                           0x0000000000000018
d f                           0x000000000000001A

Again, compare the entries in the symbol table what was shown above.

Quiz09: Using an exercise about alignment to talk about endianness

When a chunk of data consists of more than one byte the endianness of an architecture specifies in which order they are stored in memory. The ULM uses the so called big endian format. That means the most significant byte gets stored at the address of the memory location, each less significant byte gets stored at the next higher address. Hence, storing the quad word 0x123456789ABCDEF0 at some address \(A\) would have the following memory layout:

The main reason for choosing the big endian format for the ULM is that you see the bytes in memory as you would write them down. Other architectures use the little endian format where the byte order is reversed. So the same quad word would be stored as follows:

When you use instructions to fetch and store the complete chunk of data you actually don't have to care about the endianness. And unless you look at the memory layout, e.g. with the ulm-qui, you wouldn't even notice what endianness is used. Typical practical cases where endianness matter are exchanging data between computers (e.g. sending/receiving data over the internet or by exchanging binary files).

You also have to deal with endianness if you have to cherry pick single bytes from a word, long word or quad word that is stored in memory. And accessing single bytes is an operation that you need when data is not aligned.

What you have to do: Store a quad word byte by byte at an unaligned address

Write a (small) program that stores the quad word 0x123456789ABCDEF0 at address \(u\left(2^{64}-9\right)\) which is in hex representation 0xFFFFFFFFFFFFFFF7. The quad word should be stored in the big endian format, i.e. after the store operation you should have the following memory layout:

On theon submit your program with the command

submit hpc quiz09 bigendian.s

Your answers will not be evaluated automatically but you should get a confirmation email after your submit.

Example: Storing a quad word byte by byte in the little endian format

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
    # store some 64-bit literal in register %1
    ldzwq   @w3(0x123456789ABCDEF0),    %1
    shldwq  @w2(0x123456789ABCDEF0),    %1
    shldwq  @w1(0x123456789ABCDEF0),    %1
    shldwq  @w0(0x123456789ABCDEF0),    %1
    
    # load some unaligned address in register %2
    ldswq   -9,   %2

    # store %1 byte by byte in the quad word at address %2
    movb    %1,     (%2)
    shrq    8,      %1,     %1
    movb    %1,     1(%2)
    shrq    8,      %1,     %1
    movb    %1,     2(%2)
    shrq    8,      %1,     %1
    movb    %1,     3(%2)
    shrq    8,      %1,     %1
    movb    %1,     4(%2)
    shrq    8,      %1,     %1
    movb    %1,     5(%2)
    shrq    8,      %1,     %1
    movb    %1,     6(%2)
    shrq    8,      %1,     %1
    movb    %1,     7(%2)

    halt 0

After storing the complete quad word the memory layout is

Assignment: Pointer chasing

We have to use pointers in assembly all the time, but you never can have to much practise in that respect. That is because we don't have types like in C for doing the bookkeeping. You have to keep track of the size and meaning of a memory location.

In this assignment you will practise that by writing a program that uses the following data structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
        .data

        .align  8
a       .quad   c           # pointer to next node
        .long   'A'         # value of the node

        .align  8
b       .quad   d           # pointer to next node
        .long   'B'         # value of the node

        .align  8
c       .quad   b           # pointer to next node
        .long   'C'         # value of the node

        .align  8
d       .quad   a           # pointer to next node
        .long   'D'         # value of the node

This code fragment describes a circular linked list stored in the data segment:

  • When you visualize the data segment you can differentiate that some memory locations are supposed to be interpreted as pointer to a node and others as values of the node:

  • It is helpful to understand how this picture above is technically described in the assembler output. So translate this code fragment

    theon$ ulmas -o circular_data circular_data.s
    theon$ 

    and see how the symbol table in combination with the data segment can be used to infer the meaning of the memory locations:

    #DATA 8
    0x0000000000000000: 00 00 00 00 00 00 00 20 #   .quad c
    0x0000000000000008: 00 00 00 41 #       .long 'A'
    0x000000000000000C: 00 00 00 00 # padding for alignment
    0x0000000000000010: 00 00 00 00 00 00 00 30 #   .quad d
    0x0000000000000018: 00 00 00 42 #       .long 'B'
    0x000000000000001C: 00 00 00 00 # padding for alignment
    0x0000000000000020: 00 00 00 00 00 00 00 10 #   .quad b
    0x0000000000000028: 00 00 00 43 #       .long 'C'
    0x000000000000002C: 00 00 00 00 # padding for alignment
    0x0000000000000030: 00 00 00 00 00 00 00 00 #   .quad a
    0x0000000000000038: 00 00 00 44 #       .long 'D'
    #SYMTAB
    d a                           0x0000000000000000
    d b                           0x0000000000000010
    d c                           0x0000000000000020
    d d                           0x0000000000000030
    #FIXUPS
    data 0x0000000000000000 0 8 absolute [data]+32
    data 0x0000000000000010 0 8 absolute [data]+48
    data 0x0000000000000020 0 8 absolute [data]+16
    data 0x0000000000000030 0 8 absolute [data]
    

For the assignment write a program that prints the values of all nodes, beginning with the node that has label a, and then halts. You can implement the following algorithm:

Keep your program minimalistic, you don't have to use functions here. It is also sufficient to print the least significant byte of the value (this example uses the .long directive because in the picture the characters would not fit in a box that has the size of a single byte). But also make yourself aware that the .long is always aligned to 4 bytes because the preceding .quad is aligned to 8 bytes.

On theon submit your program with the command

submit hpc quiz10 pointer_chasing.s

Your answers will not be evaluated automatically but you should get a confirmation email after your submit.

For getting started

The following program prints the value of the node at label a and the value of its successor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
        .data

        .align  8
a       .quad   c           # pointer to next node
        .long   'A'         # value of the node

        .align  8
b       .quad   d           # pointer to next node
        .long   'B'         # value of the node

        .align  8
c       .quad   b           # pointer to next node
        .long   'C'         # value of the node

        .align  8
d       .quad   a           # pointer to next node
        .long   'D'         # value of the node

        .text

        ldzwq   a,      %1

        movzlq  8(%1),  %2  // fetch value of node
        movq    (%1),   %1  // fetch pointer to next node
        putc    %2

        movzlq  8(%1),  %2  // fetch value of node
        movq    (%1),   %1  // fetch pointer to next node
        putc    %2

        putc    '\n'

        halt    0

Using the directives

1
2
    .equ    node_next,   0
    .equ    node_value,  8

you can write the instructions for fetching the value and pointer to the next node more readable:

1
2
    movzlq  node_value(%1),      %2     // fetch value of node
    movq    node_next(%1),       %1     // fetch pointer to next node

The following code applies this to the previous getting started example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
        .data

        .align  8
a       .quad   c           # pointer to next node
        .long   'A'         # value of the node

        .align  8
b       .quad   d           # pointer to next node
        .long   'B'         # value of the node

        .align  8
c       .quad   b           # pointer to next node
        .long   'C'         # value of the node

        .align  8
d       .quad   a           # pointer to next node
        .long   'D'         # value of the node


        .equ    next,   0
        .equ    value,  8

        .text

        ldzwq   a,              %1

        movzlq  node_value(%1), %2
        movq    node_next(%1),  %1
        putc    %2

        movzlq  node_value(%1), %2
        movq    node_next(%1),  %1
        putc    %2

        putc    '\n'

        halt    0

How you could define the above circular list in C

In C you could define the same data structure in the data segment with the following code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
/*
    variables of type "struct Node" have two fields:

    - pointer to next node
    - some value
*/

struct Node {
    struct Node    *next;
    long            value;
};

/*
   predecalre all nodes so that we in the initialization of a node we can point
   to a node that gets defined later:
*/
extern struct Node a;
extern struct Node b;
extern struct Node c;
extern struct Node d;

/*
    define all nodes as global variables:
*/

struct Node a = {
    &c,     // &c is the address of global variable b
    'A'
};

struct Node b = {
    &d,     // &d is the address of global variable b
    'B'
};

struct Node c = {
    &b,     // &b is the address of global variable b
    'C'
};

struct Node d = {
    &a,     // &a is the address of global variable b
    'D'
};

Translate this into assembly code with

theon$ gcc -S circular_data.c
theon$ 

and have a look at the code the C compiler generated. Most of what you see should look familiar:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
        .file   "circular_data.c"
        .text
        .globl  a
        .data
        .align 16
        .type   a, @object
        .size   a, 16
a:
        .quad   c
        .quad   65
        .globl  b
        .align 16
        .type   b, @object
        .size   b, 16
b:
        .quad   d
        .quad   66
        .globl  c
        .align 16
        .type   c, @object
        .size   c, 16
c:
        .quad   b
        .quad   67
        .globl  d
        .align 16
        .type   d, @object
        .size   d, 16
d:
        .quad   a
        .quad   68
        .ident  "GCC: (GNU) 7.3.0"