Overview Home Back Next 1 2 3 4 5 Content of this session Browse Files

Using the linker and makefiles

In order to explain what problems the linker solves and how it works we begin with a toy example. The initial program just contains some subprograms that use the simple calling convention from Session 9, hence we don't have to care about the stack, function prologues and epilogues etc.

Printing the strings is done character by character with hard coded putc instructions

session11/ex01/all_in_one.s

    .text
    // do something
    putc    'm'
    putc    'a'
    putc    'i'
    putc    'n'
    putc    '\n'

    // call some function
    ldzwq   foo,    %4
    jmp     %4,     %3

    ldzwq   bar,    %4
    jmp     %4,     %3

    // do something after the call
    putc    'M'
    putc    'A'
    putc    'I'
    putc    'N'
    putc    '\n'
 
    halt    0


foo:
    // do something in foo
    putc    'f'
    putc    'o'
    putc    'o'
    putc    '\n'

    // return
    jmp     %3,     %0

bar:
    // do something in foo
    putc    'b'
    putc    'a'
    putc    'r'
    putc    '\n'

    // return
    jmp     %3,     %0

and the control flow can easily be observed when the program gets executed:

theon$ ulmas -o all_in_one all_in_one.s
theon$ ulm all_in_one
main
foo
bar
MAIN
theon$

Why simply splitting the code is not good enough

In a first approach we basically just split the source file into main.s and foo.s. In the foo.s just a .text was added in the beginning (which is actually not necessary here as that is the default segment):

session11/ex01/main.s

    .text
    // do something
    putc    'm'
    putc    'a'
    putc    'i'
    putc    'n'
    putc    '\n'

    // call some functions
    ldzwq   foo,      %4
    jmp     %4,       %3

    // call some functions
    ldzwq   bar,      %4
    jmp     %4,       %3

    // do something after the call
    putc    'M'
    putc    'A'
    putc    'I'
    putc    'N'
    putc    '\n'

    halt    0

session11/ex01/foo.s

    .text
foo:
    // do something in foo
    putc    'f'
    putc    'o'
    putc    'o'
    putc    '\n'

    // return
    jmp           %3,     %0

    .text
bar:
    // do something in foo
    putc    'b'
    putc    'a'
    putc    'r'
    putc    '\n'

    // return
    jmp           %3,     %0

When you use the ULM assembler you can invoke it with more than one source file. In that case the files are processed in the order they appear and hence joined:

theon$ ulmas -o main01 main.s foo.s
theon$ ulm main01
main
foo
bar
MAIN
theon$

However, that means if you choose the wrong order the code will not run properly:

theon$ ulmas -o main02 foo.s main.s
theon$ #ulm main02
theon$

Let's figure out what is going on. First, compare the two executables:

session11/ex01/main01

#TEXT 4
0x0000000000000000: 69 6D 00 00 #       putc 'm'
0x0000000000000004: 69 61 00 00 #       putc 'a'
0x0000000000000008: 69 69 00 00 #       putc 'i'
0x000000000000000C: 69 6E 00 00 #       putc 'n'
0x0000000000000010: 69 0A 00 00 #       putc '\n'
0x0000000000000014: 56 00 3C 04 #       ldzwq foo, %4
0x0000000000000018: 40 04 03 00 #       jmp %4, %3
0x000000000000001C: 56 00 50 04 #       ldzwq bar, %4
0x0000000000000020: 40 04 03 00 #       jmp %4, %3
0x0000000000000024: 69 4D 00 00 #       putc 'M'
0x0000000000000028: 69 41 00 00 #       putc 'A'
0x000000000000002C: 69 49 00 00 #       putc 'I'
0x0000000000000030: 69 4E 00 00 #       putc 'N'
0x0000000000000034: 69 0A 00 00 #       putc '\n'
0x0000000000000038: 09 00 00 00 #       halt 0
0x000000000000003C: 69 66 00 00 #       putc 'f'
0x0000000000000040: 69 6F 00 00 #       putc 'o'
0x0000000000000044: 69 6F 00 00 #       putc 'o'
0x0000000000000048: 69 0A 00 00 #       putc '\n'
0x000000000000004C: 40 03 00 00 #       jmp %3, %0
0x0000000000000050: 69 62 00 00 #       putc 'b'
0x0000000000000054: 69 61 00 00 #       putc 'a'
0x0000000000000058: 69 72 00 00 #       putc 'r'
0x000000000000005C: 69 0A 00 00 #       putc '\n'
0x0000000000000060: 40 03 00 00 #       jmp %3, %0
#SYMTAB
t bar                         0x0000000000000050
t foo                         0x000000000000003C
#FIXUPS
text 0x0000000000000014 1 2 absolute [text]+60
text 0x000000000000001C 1 2 absolute [text]+80

session11/ex01/main02

#TEXT 4
0x0000000000000000: 69 66 00 00 #       putc 'f'
0x0000000000000004: 69 6F 00 00 #       putc 'o'
0x0000000000000008: 69 6F 00 00 #       putc 'o'
0x000000000000000C: 69 0A 00 00 #       putc '\n'
0x0000000000000010: 40 03 00 00 #       jmp %3, %0
0x0000000000000014: 69 62 00 00 #       putc 'b'
0x0000000000000018: 69 61 00 00 #       putc 'a'
0x000000000000001C: 69 72 00 00 #       putc 'r'
0x0000000000000020: 69 0A 00 00 #       putc '\n'
0x0000000000000024: 40 03 00 00 #       jmp %3, %0
0x0000000000000028: 69 6D 00 00 #       putc 'm'
0x000000000000002C: 69 61 00 00 #       putc 'a'
0x0000000000000030: 69 69 00 00 #       putc 'i'
0x0000000000000034: 69 6E 00 00 #       putc 'n'
0x0000000000000038: 69 0A 00 00 #       putc '\n'
0x000000000000003C: 56 00 00 04 #       ldzwq foo, %4
0x0000000000000040: 40 04 03 00 #       jmp %4, %3
0x0000000000000044: 56 00 14 04 #       ldzwq bar, %4
0x0000000000000048: 40 04 03 00 #       jmp %4, %3
0x000000000000004C: 69 4D 00 00 #       putc 'M'
0x0000000000000050: 69 41 00 00 #       putc 'A'
0x0000000000000054: 69 49 00 00 #       putc 'I'
0x0000000000000058: 69 4E 00 00 #       putc 'N'
0x000000000000005C: 69 0A 00 00 #       putc '\n'
0x0000000000000060: 09 00 00 00 #       halt 0
#SYMTAB
t bar                         0x0000000000000014
t foo                         0x0000000000000000
#FIXUPS
text 0x000000000000003C 1 2 absolute [text]
text 0x0000000000000044 1 2 absolute [text]+20

Both programs have only a text segment and therefore we easily can describe the memory layout when a program gets loaded. For main01 we have

and for main02

Because the entry point is by default always at address zero in the second case the execution begins with the first instruction generated from foo.s. So the program prints “foo” before it is executing the jump instruction. Because registers and memory cells are by default random initialized on the ULM you get an undefined behaviour once the jump instruction gets executed. If you change the defaults so that registers are zero initialized you get an infinite loop printing “foo” all over.

Drawbacks: Ordering and compilation time

The requirement to know in which order the source files need to be passed to the ULM assembler is just on drawback of this approach. In practical application also the compilation time matters. Making changes in one file would also mean that all source files have to be translated again. Just assume that you have a larger project where you measure this in hours not minutes or seconds.

General idea of using a linker

Instead of combining source files each assembly file can be translated into into a so called object file (which basically just denotes that the assembler output is not necessarily executable). The linker then is used to combines these object files into an executable. In our case this process can be described by this flowchart:

The overall translation time can be reduced for two reasons. Because the assembly source files can be translated independent of each over this can be done on a multiprocessor system in parallel. As linkage requires the existence of all object files this step is a potential bottleneck of the process. However, the linker only operates on the object files which have a much simpler structure compared to the assembler code. Basically there are no costs for lexical and syntactical analysis and code generation is mainly limited by the throughput of the file system.

Requirements for using a linker

When you simply apply the workflow described above you get an error message from the linker (not the assembler):

theon$ ulmas -o main.o main.s
theon$ ulmas -o foo.o foo.s
theon$ ulmld -o main main.o foo.o
ulmld: execution aborted
Unresolved symbol _start
theon$

Creating an executable should not depend on the order the linker receives object files. Therefore the linker requires that the (logical) entry point of the program is defined by a symbol _start. The generated executable will contain at the actual entry point instructions to load the address _start into a register followed by a jump to this address. When we are talking about ulmld there is no need to beat about bush, a generated executable will always begin with these five instructions:

    ldzwq       @w3(_start),    %1
    shldwq      @w2(_start),    %1
    shldwq      @w1(_start),    %1
    shldwq      @w0(_start),    %1
    jmp         %1,             %2

So in for our toy example the memory layout of a generated executable can be either described by

or by

Why you have to do more then just add a `_start` label

The logical entry point of our program is the first instruction in main.s. However, just adding the label _start in main.s there will not solve the problem

session11/ex02/main.s

    .text
_start:
    // do something
    putc    'm'
    putc    'a'
    putc    'i'
    putc    'n'
    putc    '\n'

    // call some functions
    ldzwq   foo,      %4
    jmp     %4,       %3

    // call some functions
    ldzwq   bar,      %4
    jmp     %4,       %3

    // do something after the call
    putc    'M'
    putc    'A'
    putc    'I'
    putc    'N'
    putc    '\n'

    halt    0

session11/ex02/foo.s

    .text
foo:
    // do something in foo
    putc    'f'
    putc    'o'
    putc    'o'
    putc    '\n'

    // return
    jmp           %3,     %0

    .text
bar:
    // do something in foo
    putc    'b'
    putc    'a'
    putc    'r'
    putc    '\n'

    // return
    jmp           %3,     %0

From the linker we still get the same error message:

theon$ ulmas -o main.o main.s
theon$ ulmas -o foo.o foo.s
theon$ ulmld -o main main.o foo.o
ulmld: execution aborted
Unresolved symbol _start
theon$

This seems to be wires as looking at the symbol table of main.o clearly shows that a symbol _start exists:

theon$ sed -n '/^#FIXUPS/q;/^#SYMTAB/,$p' main.o
#SYMTAB
t _start                      0x0000000000000000
U bar                         0x0000000000000000
U foo                         0x0000000000000000
theon$

Here another detail becomes relevant, symbols can be local or global. By default each symbol defined in the assembly code is local and the linker ignores all local symbols. The reason for that is to make it more manageable to avoid naming conflicts. Most symbols are the result of labels used to implement control structures (like loops). It is already challenging enough to choose here unique identifiers for the scope of translation unit. It would not be feasible to require uniqueness between different units. Furthermore, it reduces the amount of symbols the linker has to consider, and for the above reasons the job of the linker has to be as simple as possible.

Whether symbols are local or global can be directly inferred from the symbol tables. If symbol type is indicated by a lowercase letter (like here with 't') it is local, otherwise global. In general the type tags of the symbol table have the same meaning as specified for the output of Unix utility nm.

In order to define global symbols the directive .globl <symbol-name> has to be used. So for a working example we apply the following modification:

session11/ex03/main.s

    .text
    .globl _start
_start:
    // do something
    putc    'm'
    putc    'a'
    putc    'i'
    putc    'n'
    putc    '\n'

    // call some functions
    ldzwq   foo,      %4
    jmp     %4,       %3

    // call some functions
    ldzwq   bar,      %4
    jmp     %4,       %3

    // do something after the call
    putc    'M'
    putc    'A'
    putc    'I'
    putc    'N'
    putc    '\n'

    halt    0

session11/ex03/foo.s

    .text
    .globl foo
foo:
    // do something in foo
    putc    'f'
    putc    'o'
    putc    'o'
    putc    '\n'

    // return
    jmp           %3,     %0

    .text
    .globl bar
bar:
    // do something in foo
    putc    'b'
    putc    'a'
    putc    'r'
    putc    '\n'

    // return
    jmp           %3,     %0

With the we finally get an executable from the linker:

theon$ ulmas -o main.o main.s
theon$ ulmas -o foo.o foo.s
theon$ ulmld -o main main.o foo.o
theon$ ulm main
main
foo
bar
MAIN
theon$

Also checkout the difference in the symbol tables of main.o and foo.o:

theon$ sed -n '/^#FIXUPS/q;/^#SYMTAB/,$p' main.o
#SYMTAB
T _start                      0x0000000000000000
U bar                         0x0000000000000000
U foo                         0x0000000000000000
theon$ sed -n '/^#FIXUPS/q;/^#SYMTAB/,$p' foo.o
#SYMTAB
T bar                         0x0000000000000014
T foo                         0x0000000000000000
theon$

How linkage was done in this example

In general the executable produced by the linker consists of a text, data and bss segment where each segment is a linkage of corresponding segments found in the object files. In our example we only have text segments. In a first step these just get get appended to the code produced by the linker for jumping to the logical entry point. The memory layout of this intermediate result can be described by

When you look at the code of main.o you see that the assembler decoded the instructions with the undefined symbols foo and bar as follows:

theon$ egrep "ldzwq.*(foo|bar)" main.o
0x0000000000000014: 56 00 00 04 #       ldzwq foo, %4
0x000000000000001C: 56 00 00 04 #       ldzwq bar, %4
theon$

The fields X and Y of the ldzwq instruction contain a zero bit pattern. The linker has to patch this instructions with the actual addresses of foo and bar. And also in its own code the linker has to patch the instructions for loading the address of _start analogously. In a picture we can indicate what bytes are relevant for fixing the code as follows:

The linker gets the information relevant for patches from the relocation table (#FIXUPS). For example, in main.o you find:

theon$ sed -n '/^#FIXUPS/,$p' main.o
#FIXUPS
text 0x0000000000000014 1 2 absolute foo
text 0x000000000000001C 1 2 absolute bar
theon$

The format of each line basically describes “Where to fix” and “How to fix” in more detail:

1	<segment> <address> <offset> <num bytes> <absolute or relative> <symbol name>

Let's approach these details by example. The code of the linker consists of 5 instructions or 20 bytes. So the address where the text segment of main.o will be located is 0x14. The first patch needs to be applied to the instruction which had within main.o the address 0x14. As the linker relocated the text segment of main.o to address 0x14 this instruction now has address 0x28. At address 0x28 two bytes with an offset of one byte need to be patched, this refers to the bytes with address 0x29 and 0x2A . This answers the question “where to fix”. And you see in main that actually these two bytes where modified:

theon$ egrep "ldzwq.*foo" main
0x0000000000000028: 56 00 50 04 #       ldzwq foo, %4
theon$

Next we can reverse engineer how the bytes where patched. The symbol foo is defined in foo.o:

theon$ sed -n '/^#FIXUPS/q;/^#SYMTAB/,$p' foo.o
#SYMTAB
T bar                         0x0000000000000014
T foo                         0x0000000000000000
theon$

It has the type text and value zero. That means it refers to the “address of the text segment of foo.o plus zero”, and the linker located the text segment of foo.o at address 0x50.

Analogously you can checkout that instruction

theon$ egrep "ldzwq.*bar" main.o
0x000000000000001C: 56 00 00 04 #       ldzwq bar, %4
theon$

was patched with

theon$ egrep "ldzwq.*bar" main
0x0000000000000030: 56 00 64 04 #       ldzwq bar, %4
theon$

Applying all patches can be described as adding these pointers to the pictures describing the memory layout:

Using GNU make for a build system

As soon as the number of translation units grows it is not convenient to manual translate the source files and link the object files. Instead a build system can be used to automatize the process. Such a system should be aware that for a source file an object file needs to be created only once, and recreated only if the source file was modified. Also linkage is only required if an object file was modified or new object files become relevant for creating the target. While there are many build system available, and acclaimed to be superior, we will use GNU make and initially only its most essential features.

Dependencies and action rules

For motivating notations and terminologies the flowchart for building our toy example gets rewritten as follows:

This chart can be interpreted and read as follows:

The executable main depends on the object files main.o and foo.o. The linker generates main from these objects, and main has to be regenerate it if main.o or foo.o was modified.
The object file main.o depends on main.s. The assembler generates main.o from main.s, and main.o has to be regenerated if main.s gets changed.
Analogously, the object file foo.o depends on foo.s, i.e. it gets generated or regenerated depending on foo.s.

This directed graph and action rules for generating files can be described in makefiles that can be interpreted by the make command:

session11/ex03/Makefile

all:    main

main:   main.o  foo.o
        ulmld -o main main.o foo.o

main.o: main.s
        ulmas -o main.o main.s

foo.o:  foo.s
        ulmas -o foo.o foo.s

This file consists of a sequence of rules that have to following format:

<target> : <list of dependencies>
        <command to generate the target>

For example

main.o : main.s
        ulmas -o main.o main.s

It is important to point out that the indenting of the command needs to be done with at least on tab character (make sure your editor does not expand them with spaces).

How make works

When you run the make command it will first search in the current directory for the following files, and in this order: GNUmakefile, makefile and Makefile. The first file, and only this, gets executed by applying the first rule found in his file. In the above makefile that would be

1	all : main

Applying a rule consists of two steps:

If a rule for a dependence exists this rule gets applied first. Hence applying rules happens recursively. Applying the first rule requires that rule
1 2
main : main.o foo.o ulmld -o main main.o foo.o
was applied beforehand. Which in turn requires that the rules
1 2
main.o : main.s ulmas -o main.o main.s
and
1 2
foo.o : foo.s ulmas -o foo.o foo.s
were applied beforehand. And the recursion is terminated by these two rules.
After eventual rules were applied make checks if the command for generating/updating the target needs to be executed. For that make simply checks if the target does not exist or has a modification time that is older than any of the dependent files.

Let's begin with a clean sheet:

theon$ rm -f *.o main
theon$

Running make now causes to generate all object files and then the executable:

theon$ make
ulmas -o main.o main.s
ulmas -o foo.o foo.s
ulmld -o main main.o foo.o
theon$

Running make again will only trigger the first rule for all which has no action command:

theon$ make
make: Nothing to be done for 'all'.
theon$

With the command touch we can update the modification time of a file to now without modifying its content:

theon$ touch main.s
theon$

Now make thinks that main.s has changed and main.o needs an update:

theon$ make
ulmas -o main.o main.s
ulmld -o main main.o foo.o
theon$

Deleting main.o will trigger its generation and the linkage of the executable:

theon$ rm main.o
theon$ make
ulmas -o main.o main.s
ulmld -o main main.o foo.o
theon$

Deleting only the executable will trigger the linkage only:

theon$ rm main
theon$ make
ulmld -o main main.o foo.o
theon$

Quiz 13: Break bubble sort into pieces!

Take the program bubblesort from session 10, i.e.

session10/func/bubblesort.s

        .equ    FP,             1
        .equ    SP,             2
        .equ    RET,            3

//------------------------------------------------------------------------------
// Function _start()
//------------------------------------------------------------------------------
        .equ    ret,            0
        .equ    fp,             8
        .equ    rval,           16

        .text
_start:
        // begin of the function body

        ldzwq   0,              %SP

        // call function main()
        subq    24,             %SP,            %SP
        ldzwq   main,           %4
        jmp     %4,             %RET
        movq    rval(%SP),      %4
        addq    24,             %SP,            %SP

        halt    %4


//------------------------------------------------------------------------------
// Global variables
//------------------------------------------------------------------------------

        .data
msg     .string "hello, world!"

//------------------------------------------------------------------------------
// Function main()
//------------------------------------------------------------------------------
        .equ    ret,            0
        .equ    fp,             8
        .equ    rval,           16

        .text
main:
        // function prologue
        movq    %RET,           ret(%SP)
        movq    %FP,            fp(%SP)
        addq    0,              %SP,            %FP
        // begin of the function body

        /*
           puts(msg);
        */
        // call procedure puts(msg)
        subq    24,             %SP,            %SP
        # store argument msg in 16(%SP)
        ldzwq   msg,            %4
        movq    %4,             16(%SP)
        ldzwq   puts,           %4
        jmp     %4,             %RET
        movq    rval(%SP),      %4
        addq    24,             %SP,            %SP

        /*
           bubblesort(msg, strlen(msg));
        */
        // call procedure bubblesort(msg, strlen(msg))
        subq    32,             %SP,            %SP
        # store argument msg in 16(%SP)
        ldzwq   msg,            %4
        movq    %4,             16(%SP)
        # store argument strlen(msg) in 24(%SP)

        // call function strlen(msg)
        subq    32,             %SP,            %SP
        # store argument msg in 24(%SP)
        ldzwq   msg,            %4
        movq    %4,             24(%SP)
        ldzwq   strlen,         %4
        jmp     %4,             %RET
        movq    rval(%SP),      %4
        addq    32,             %SP,            %SP
        movq    %4,             24(%SP)

        ldzwq   bubblesort,             %4
        jmp     %4,             %RET
        addq    32,             %SP,            %SP

        /*
           puts(msg);
        */
        // call procedure puts(msg)
        subq    24,             %SP,            %SP
        # store argument msg in 16(%SP)
        ldzwq   msg,            %4
        movq    %4,             16(%SP)
        ldzwq   puts,           %4
        jmp     %4,             %RET
        movq    rval(%SP),      %4
        addq    24,             %SP,            %SP

        /*
           return 0;
        */
        ldzwq   0,              %4
        movq    %4,             rval(%FP)
        jmp     main.leave
        
        // end of the function body
        // function epilogue
main.leave:
        addq    0,              %FP,            %SP
        movq    fp(%SP),        %FP
        movq    ret(%SP),       %RET
        jmp     %RET,           %0

//------------------------------------------------------------------------------
// Procedure puts(str)
//------------------------------------------------------------------------------
        .equ    ret,            0
        .equ    fp,             8

        // procedure arguments
        .equ    str,            16

        .text
puts:
        // function prologue
        movq    %RET,           ret(%SP)
        movq    %FP,            fp(%SP)
        addq    0,              %SP,            %FP
        // reserve space for 0 local variables.
        subq    0,              %SP,            %SP
        // begin of the function body

        /*
            if (*str == 0)
                goto puts.leave;
        */
puts.while
        movq    str(%FP),       %4
        movzbq  (%4),           %4
        subq    0,              %4,             %0
        jz      puts.while.done

        /*
            putchar(*str);
        */
        movq    str(%FP),       %4
        movzbq  (%4),           %4
        putc    %4

        /*
           str = str + 1;
        */
        movq    str(%FP),       %4
        addq    1,              %4,             %4
        movq    %4,             str(%FP)

        /*
           goto puts.while;
        */
        jmp     puts.while
puts.while.done:

        /*
           putchar('\n');
        */
        putc    '\n'

        // end of the function body
        // function epilogue
puts.leave:
        addq    0,              %FP,            %SP
        movq    fp(%SP),        %FP
        movq    ret(%SP),       %RET
        jmp     %RET,           %0



//------------------------------------------------------------------------------
// Function strlen(str)
//------------------------------------------------------------------------------
        .equ    ret,            0
        .equ    fp,             8
        .equ    rval,           16

        // function arguments
        .equ    str,            24

        // local variables
        .equ    ch,             -8

        .text
strlen:
        // function prologue
        movq    %RET,           ret(%SP)
        movq    %FP,            fp(%SP)
        addq    0,              %SP,            %FP
        // reserve space for 1 local variables.
        subq    8,              %SP,            %SP
        // begin of the function body

        /*
            ch = str;
        */
        movq    str(%FP),       %4
        movq    %4,             ch(%FP)

        /*
           if (*ch == 0)
                goto strlen.while.done;
        */
strlen.while:
        movq    ch(%FP),        %4
        movzbq  (%4),           %4
        subq    0,              %4,             %0
        jz      strlen.while.done

        /*
            ch = ch + 1;
        */
        movq    ch(%FP),        %4
        addq    1,              %4,             %4
        movq    %4,             ch(%FP)

        /*
           goto strlen.while;
        */
        jmp     strlen.while

strlen.while.done:
        /*
                return ch - str;
        */
        movq    ch(%FP),        %4
        movq    str(%FP),       %5
        subq    %5,             %4,             %4
        movq    %4,             rval(%FP)
        jmp     strlen.leave

        // end of the function body
        // function epilogue
strlen.leave:
        addq    0,              %FP,            %SP
        movq    fp(%SP),        %FP
        movq    ret(%SP),       %RET
        jmp     %RET,           %0

//------------------------------------------------------------------------------
// Procedure bubblesort(A, n)
//------------------------------------------------------------------------------
        .equ    ret,            0
        .equ    fp,             8

        // procedure arguments
        .equ    A,              16
        .equ    n,              A+8

        // local variables
        .equ    swapped,        -8
        .equ    i,              swapped-8
        .equ    tmp,            i-8

        .text
bubblesort:
        // function prologue
        movq    %RET,           ret(%SP)
        movq    %FP,            fp(%SP)
        addq    0,              %SP,            %FP
        // reserve space for 3 local variables.
        subq    24,             %SP,            %SP
        // begin of the function body

bubblesort_do:
        /*
           swapped = 0;
        */
        ldzwq   0,              %4
        movq    %4,             swapped(%FP)

        /*
           i = 1;
        */
        ldzwq   1,              %4
        movq    %4,             i(%FP)

bubblesort_for:
        /*
           if (i >= n)
                goto bubblesort_endfor;
        */
        movq    i(%FP),         %4
        movq    n(%FP),         %5
        subq    %5,             %4,             %0
        jae     bubblesort_endfor

        /*
            if (A[i-1] <= A[i])
                goto bubblesort_endif;
        */
        movq    i(%FP),         %4
        subq    1,              %4,             %4      # i-1
        movq    i(%FP),         %5                      # i
        movq    A(%FP),         %6
        movzbq  (%6, %4),       %4                      # A[i-1]
        movzbq  (%6, %5),       %5                      # A[i]
        subq    %5,             %4,             %0
        jbe     bubblesort_endif

        /*
           tmp = A[i];
        */
        movq    i(%FP),         %4                      # i
        movq    A(%FP),         %5
        movzbq  (%5, %4),       %4                      # A[i]
        movb    %4,             tmp(%FP)

        /*
           A[i] = A[i-1];
        */
        movq    i(%FP),         %4
        subq    1,              %4,             %4      # i-1
        movq    i(%FP),         %5                      # i
        movq    A(%FP),         %6
        movzbq  (%6, %4),       %4                      # fetch A[i-1]
        movb    %4,             (%6, %5)                # store A[i]

        /*
           A[i-1] = tmp;
        */
        movzbq  tmp(%FP),       %4
        movq    i(%FP),         %5
        subq    1,              %5,             %5      # i-1
        movq    A(%FP),         %6
        movb    %4,             (%6, %5)                # store A[i]

        /*
           swapped = 1;
        */
        ldzwq   1,              %4
        movq    %4,             swapped(%FP)

bubblesort_endif

        /*
           i = i + 1;
        */
        movq    i(%FP),         %4
        addq    1,              %4,             %4
        movq    %4,             i(%FP)

        /*
           goto bubblesort_for;
        */
        jmp     bubblesort_for

bubblesort_endfor:

        /*
           n = n - 1;
        */
        movq    n(%FP),         %4
        subq    1,              %4,             %4
        movq    %4,             n(%FP)

        /*
           if (swapped)
                goto bubblesort_do;
        */
        movq    swapped(%FP),   %4
        subq    0,              %4,             %0
        jnz     bubblesort_do

        // end of the function body
        // function epilogue
bubblesort.leave:
        addq    0,              %FP,            %SP
        movq    fp(%SP),        %FP
        movq    ret(%SP),       %RET
        jmp     %RET,           %0

and split it into source files as follows:

Source file crt0.s (see here why it is called: crt0) only contains the code block with label _start for initializing the stack,
main.s contains only the function main and the global variable for the string,
puts.s contains only the function puts,
strlen.s contains only the function strlen and
bubblesort.s contains only the function bubblesort.

Also write a makefile for generating and updating needed obeject files and an executable test_bubble.

On theon submit the files with

submit hpc quiz13 crt0.s main.s puts.s strlen.s bubblesort.s

If you don't get any message it means there was no error.

Btw: It is certainly overkill but just for fun you can use your makefile to translate the assembly files in parallel.