===================================
ULM Assembler (Part 1): First Steps					[TOC]
===================================

---- VIDEO ------------------------------
https://www.youtube.com/embed/chnIUD6551g
-----------------------------------------


Syntax Highlighting in Vim
==========================

You can enable syntanx highlighting by:

- Adding the followoing line to your `~/.vimrc`

  ---- CODE (type=vim) ---------------------------------------------------------
  autocmd FileType asm set syntax=ulmasm
  ------------------------------------------------------------------------------

- And saving the following file `ulmasm.vim` in the directory `~/.vim/syntax/`
  (if this directory does not exist create it):

  ---- CODE (file=session09/ulmasm.vim, fold) ----------------------------------
  " Vim syntax file
  " Language:	ULM assembler
  " Maintainer:	Michael Christian Lehn <michael.lehn@uni-ulm.de>
  " Last Change:	2020-2-15
  " License:    	Vim (see :h license)
  
  " For version 5.x: Clear all syntax items
  " For version 6.x: Quit when a syntax file was already loaded
  if version < 600
    syntax clear
  elseif exists("b:current_syntax")
    finish
  endif
  
  syntax case match
  
  syntax match asmKeyword /[a-zA-Z_]\+\>/ contained skipwhite
  syntax match asmKeyword /\.align\>/ contained skipwhite
  syntax match asmKeyword /\.bss\>/ contained skipwhite
  syntax match asmKeyword /\.byte\>/ contained skipwhite
  syntax match asmKeyword /\.data\>/ contained skipwhite
  syntax match asmKeyword /\.equ\>/ contained skipwhite
  syntax match asmKeyword /\.equiv\>/ contained skipwhite
  syntax match asmKeyword /\.global\>/ contained skipwhite
  syntax match asmKeyword /\.globl\>/ contained skipwhite
  syntax match asmKeyword /\.long\>/ contained skipwhite
  syntax match asmKeyword /\.quad\>/ contained skipwhite
  syntax match asmKeyword /\.set\>/ contained skipwhite
  syntax match asmKeyword /\.space\>/ contained skipwhite
  syntax match asmKeyword /\.string\>/ contained skipwhite
  syntax match asmKeyword /\.text\>/ contained skipwhite
  syntax match asmKeyword /\.word\>/ contained skipwhite
  
  syntax match asmColon ":" nextgroup=asmKeyword skipwhite
  syntax match asmDelimiter /[.$%,()]\|@w[0-3]/
  syntax match asmLiteral /[1-9][0-9]*/
  syntax match asmLiteral /[0-7][0-7]*/
  syntax match asmLiteral /0x[0-9a-zA-Z][0-9a-zA-Z]*/
  syntax region asmLiteral start=/"/ skip=/\\"/ end=/"/
  syntax match asmIdentifier /[A-Za-z_.][A-Za-z0-9_.]*/
  
  syntax match asmLabel /^[A-Za-z_.][A-Za-z0-9_.]*/ nextgroup=asmKeyword,asmColon,asmComment skipwhite
  syntax match asmLabel /^[ \t][ \t]*/ nextgroup=asmKeyword skipwhite
  
  syntax match asmStillLabel /[A-Za-z_.][A-Za-z0-9_.]*/ contained nextgroup=asmKeyword,asmColon,asmComment skipwhite
  syntax match asmStillLabel /[ \t][ \t]*/ contained nextgroup=asmKeyword skipwhite
  
  
  syntax region asmComment start="^//" end="$"
  syntax region asmComment start="//" end="$"
  syntax region asmComment start="/\*" end="\*/"
  syntax region asmComment start="^/\*" end="\*/" nextgroup=asmStillLabel
  syntax region asmComment start="^#" end="$"
  syntax region asmComment start="#" end="$"
  
  
  highlight link asmComment Comment
  highlight link asmLabel Label
  highlight link asmStillLabel Label
  highlight link asmLiteral Number
  highlight link asmIdentifier Identifier
  highlight link asmKeyword Type
  highlight link asmString String
  ------------------------------------------------------------------------------

Note that this syntax description for Vim is not perfect at all. Every
identifier in the mnemonic/pseudo-op field that does not begin with a dot will
be highlighted as a keyword. This is due to this rule:

---- CODE (type=vim) -----------------------------------------------------------
syntax match asmKeyword /[a-zA-Z_]\+\>/ contained skipwhite
--------------------------------------------------------------------------------

A better solution would be to have here a list (generated from the `isa.txt`)
that contains all mnemonics that are actually defined. This however means that
you would have for every `isa.txt` variant an extra vim syntax description.


Instruction Set Used in the Video
=================================

:import: session09/hello/0_ulm_variants/hello/isa.txt [fold]

The Hello World Assembly Program Shown in the Video
===================================================

:import: session09/hello/0_ulm_variants/hello/hello.s [fold]

Translating the Assembly Program into an Executable
===================================================

---- SHELL (path=session09/hello, hide) ----------------------------------------
make
--------------------------------------------------------------------------------

With

---- SHELL (path=session09/hello) ----------------------------------------------
1_ulm_build/hello/ulmas 0_ulm_variants/hello/hello.s
--------------------------------------------------------------------------------

the following exectuable `a.out` (the default name for the assembler output)
gets created:

:import: session09/hello/a.out


Format of the Assembler Output
==============================
The generated output consists of different sections. Each of this sections
starts with a header (which also separates the section from a previous
section). In this case the assembler output has the following 4 sections:

`#TEXT <alignment>`
~~~~~~~~~~~~~~~~~~~
is the header for the _text segment_. This section contains the instructions for
the "hello, world" program.

The alignment parameter is (for convenience) a decimal numeral and in this
case 4. It specifies that the loader has to copy this section to a memory block
with a start address that is a multiple of 4.  For the moment this is not
relevant as in this case the start address is 0 (and hence a multiple of any
integer that is not zero). 

Lines of the text segment have the format

`[address:] instruction  [# comments]`

Addresses and comments are optional (and for convenience). The instructions
are in hexadecimal. By default the loader copies the text segment to memory
beginning by address 0.

---- BOX -----------------------------------------------------------------------
Compare the instructions of the text segment with the memory content from
address 0x00 to 0x20.
--------------------------------------------------------------------------------

`#DATA <alignment>`
~~~~~~~~~~~~~~~~~~~
is the header for the _data segment_. This section contains the data for the
"hello, world" program and has the same format as the text segment (i.e
optional address, actual data, optional comments).

The loader copies the data segment to memory such that it follows the text
segment. Like the text segment, the data segment can have alignment
restrictions. That means in general there can be a gap in memory between the
text and data segment.  However, in the "hello, world" program the alignment of
the data segment is 1. Hence, the loader begins the data segment at address
0x20 (where the text segment ended).

`#SYMTAB`
~~~~~~~~~
is the header for the _symbol table_.

Labels and `.equ` directives generate symbols that have a value and a type:

- Labels in the text segment have type _text_ and the value is an address
  within the text segment. For example, the text symbol `halt` has the value
  0x1C.
- Accordingly labels in the data segment have type _data_ and the value is the 
  relative address to the begin of the data segment. For example, the text
  symbol `msg` has value 0x00 (and not 0x20).
- The `.equ` directive defines a symbol of type _absolute_  with a given value.
  Here for example, the symbol `p` has value 1.

For loading (and running) the program the symbol table is not relevant. But it
is relevant for linking (which will be covered in upcoming sessions). 

If an instruction or directive contains an undefined symbol an entry of type
_undefined_ and value 0 is added to the symbol table. 

`#FIXUPS`
~~~~~~~~~
is the header for the _relocation table_.

Like the symbol table this will be relevant for linking and will be covered in
upcoming sessions.

Memory layout of the "hello, world" program
-------------------------------------------
The information of the text and data segment together with the text and data
labels from symbol table can be visualized by

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}
\renewcommand\MemCellWidth { 0.6 }

\DrawMemArrayOpenRight{0}{31}
\DrawMemAddress{0}{0x00}
\DrawMemAddress{4}{0x04}
\DrawMemAddress{8}{0x08}
\DrawMemAddress{12}{0x0C}
\DrawMemAddress{16}{0x10}
\DrawMemAddress{20}{0x14}
\DrawMemAddress{24}{0x18}
\DrawMemAddress{28}{0x1C}
\DrawMemAddress{32}{0x20}

\DrawMemLabel{28}{halt}
\DrawMemLabel{4}{load}
\DrawMemVariable[gray!50]{0}{32}{}

\DrawMemCellContent{0}{08}
\DrawMemCellContent{1}{00}
\DrawMemCellContent{2}{20}
\DrawMemCellContent{3}{01}
\DrawMemCellContent{4}{09}
\DrawMemCellContent{5}{01}
\DrawMemCellContent{6}{00}
\DrawMemCellContent{7}{02}
\DrawMemCellContent{8}{05}
\DrawMemCellContent{9}{00}
\DrawMemCellContent{10}{02}
\DrawMemCellContent{11}{00}
\DrawMemCellContent{12}{07}
\DrawMemCellContent{13}{00}
\DrawMemCellContent{14}{00}
\DrawMemCellContent{15}{04}
\DrawMemCellContent{16}{03}
\DrawMemCellContent{17}{02}
\DrawMemCellContent{18}{00}
\DrawMemCellContent{19}{00}
\DrawMemCellContent{20}{0A}
\DrawMemCellContent{21}{01}
\DrawMemCellContent{22}{01}
\DrawMemCellContent{23}{01}
\DrawMemCellContent{24}{04}
\DrawMemCellContent{25}{FF}
\DrawMemCellContent{26}{FF}
\DrawMemCellContent{27}{FB}
\DrawMemCellContent{28}{01}
\DrawMemCellContent{29}{00}
\DrawMemCellContent{30}{00}
\DrawMemCellContent{31}{00}

\DrawAnnotateMemCellAbove[2]{15}{Text segment}
\end{tikzpicture}
--------------------------------------------------------------------------------

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}
\renewcommand\MemCellWidth { 0.6 }

\DrawMemArrayOpen{0}{31}
\DrawMemAddress{0}{0x20}
\DrawMemAddress{4}{0x24}
\DrawMemAddress{8}{0x28}
\DrawMemAddress{12}{0x2C}
\DrawMemAddress{16}{0x30}
\DrawMemAddress{20}{0x34}
\DrawMemAddress{24}{0x38}
\DrawMemAddress{28}{0x3C}
\DrawMemAddress{32}{0x40}

\DrawMemLabel{0}{msg}
\DrawMemVariable[orange!50]{0}{15}{}

\DrawMemCellContent{0}{68}
\DrawMemCellContent{1}{65}
\DrawMemCellContent{2}{6C}
\DrawMemCellContent{3}{6C}
\DrawMemCellContent{4}{6F}
\DrawMemCellContent{5}{2C}
\DrawMemCellContent{6}{20}
\DrawMemCellContent{7}{77}
\DrawMemCellContent{8}{6F}
\DrawMemCellContent{9}{72}
\DrawMemCellContent{10}{6C}
\DrawMemCellContent{11}{64}
\DrawMemCellContent{12}{21}
\DrawMemCellContent{13}{0A}
\DrawMemCellContent{14}{00}

\DrawAnnotateMemCell[2]{5}{Data segment}

\end{tikzpicture}
--------------------------------------------------------------------------------

Disassembling the instructions in the text segment and interpreting the bytes
in the data segment as zero terminated string allows to almost see the original
source code in the memory layout (due to spacing issues the symbols `addr` and
`ch` for the literals $1$ and $2$ respectiuvely are not used):

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}
\renewcommand\MemCellWidth { 0.75 }

\DrawMemArrayOpenRight{0}{31}
\DrawMemAddress{0}{0x00}
\DrawMemAddress{4}{0x04}
\DrawMemAddress{8}{0x08}
\DrawMemAddress{12}{0x0C}
\DrawMemAddress{16}{0x10}
\DrawMemAddress{20}{0x14}
\DrawMemAddress{24}{0x18}
\DrawMemAddress{28}{0x1C}
\DrawMemAddress{32}{0x20}

\DrawMemLabel{28}{halt}
\DrawMemLabel{4}{load}

\begingroup
\renewcommand\PaddingMemVariable {0.05}
\DrawMemVariable[gray!90]{0}{32}{}
\par\endgroup

\DrawLongVariable[gray!50]{0}{ldzwq msg, \%1}
\DrawLongVariable[gray!50]{4}{\small movzbq (\%1), \%2}
\DrawLongVariable[gray!50]{8}{\small subq 0, \%ch, \%0}
\DrawLongVariable[gray!50]{12}{je halt}
\DrawLongVariable[gray!50]{16}{putc \%2}
\DrawLongVariable[gray!50]{20}{addq 1, \%1, \%1}
\DrawLongVariable[gray!50]{24}{jmp fetch}
\DrawLongVariable[gray!50]{28}{halt 0}

\end{tikzpicture}
--------------------------------------------------------------------------------

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}
\renewcommand\MemCellWidth { 0.75 }

\DrawMemArrayOpen{0}{31}
\DrawMemAddress{0}{0x20}
\DrawMemAddress{4}{0x24}
\DrawMemAddress{8}{0x28}
\DrawMemAddress{12}{0x2C}
\DrawMemAddress{16}{0x30}
\DrawMemAddress{20}{0x34}
\DrawMemAddress{24}{0x38}
\DrawMemAddress{28}{0x3C}
\DrawMemAddress{32}{0x40}

\DrawMemLabel{0}{msg}

\begingroup
\renewcommand\PaddingMemVariable {0.05}
\DrawMemVariable[orange!90]{0}{15}{}
\par\endgroup


\DrawByteVariable[orange!50]{0}{'h'}
\DrawByteVariable[orange!50]{1}{'e'}
\DrawByteVariable[orange!50]{2}{'l'}
\DrawByteVariable[orange!50]{3}{'l'}
\DrawByteVariable[orange!50]{4}{'o'}
\DrawByteVariable[orange!50]{5}{','}
\DrawByteVariable[orange!50]{6}{' '}
\DrawByteVariable[orange!50]{7}{'w'}
\DrawByteVariable[orange!50]{8}{'o'}
\DrawByteVariable[orange!50]{9}{'r'}
\DrawByteVariable[orange!50]{10}{'l'}
\DrawByteVariable[orange!50]{11}{'d'}
\DrawByteVariable[orange!50]{12}{'!'}
\DrawByteVariable[orange!50]{13}{'\textbackslash n'}
\DrawByteVariable[orange!50]{14}{0}

\end{tikzpicture}
--------------------------------------------------------------------------------

From the symbol table we also know that symbols `p` and `ch` were defined with
absolute value 1 and 2 respectively. However, we can not determine where the
symbols were used. So for example, we don't know that the first instruction was
written as `ldzwq msg, %addr` in the source file. In practise that makes it hard
to understand disassembled programs where the original source is not available
(and there are actually legal cases where you have to deal with such problems).

Pointers! Start learning about them here!
=========================================
Have a look at what the first two instructions are doing and how this can be
represented descriptively.

`ldzwq msg, %addr`
~~~~~~~~~~~~~~~~~~
The assembler replaces the label `msg` with the address of the `h` in the
"hello, world!" string. That means `msg` has value 0x20 (or 32 in decimal).

Obviously the address of the string depends on where the loader will copy the
data segment when we run the program. And this in turn depends on the size of
the text segment. It requires some kind of bookkeeping to figure out the actual
address of the string by just looking at the assembly source code. But it is
possible, we can do it and the assembler can do it. But it is less error prone
if the assembler is doing it, and using labels delegates this job to the
assembler.

So after this instruction the value in `%addr` has the meaning "address of the
first character in the string". So we think of `%addr` as being a "pointer to
the first character in the string":

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}
\renewcommand\MemCellWidth { 1.2 }

\DrawMemArrayOpen{0}{15}

\DrawMemLabel{0}{msg}

\begingroup
\renewcommand\PaddingMemVariable {0.05}
\DrawMemVariable[orange!90]{0}{15}{}
\par\endgroup


\DrawPointer{0}{\%addr}

\DrawByteVariable[orange!50]{0}{'h'}
\DrawByteVariable[orange!50]{1}{'e'}
\DrawByteVariable[orange!50]{2}{'l'}
\DrawByteVariable[orange!50]{3}{'l'}
\DrawByteVariable[orange!50]{4}{'o'}
\DrawByteVariable[orange!50]{5}{','}
\DrawByteVariable[orange!50]{6}{' '}
\DrawByteVariable[orange!50]{7}{'w'}
\DrawByteVariable[orange!50]{8}{'o'}
\DrawByteVariable[orange!50]{9}{'r'}
\DrawByteVariable[orange!50]{10}{'l'}
\DrawByteVariable[orange!50]{11}{'d'}
\DrawByteVariable[orange!50]{12}{'!'}
\DrawByteVariable[orange!50]{13}{'\textbackslash n'}
\DrawByteVariable[orange!50]{14}{0}

\end{tikzpicture}
--------------------------------------------------------------------------------


`movzbq (%addr), %ch`
~~~~~~~~~~~~~~~~~~~~~
This instruction copies the _value at address `%addr`_ into `%ch`.

In this instruction the pointer get _dereferenced_. And it is impossible to
overestimate how important it will be to understand what _dereferencing a
pointer_ means. So I will explain and talk about it more than once.

Dereferencing means that we refer to a value at the end of a pointer. And this
requires two pieces of information:

- _Location: "Where does the pointers point to?"_

  This information is the address stored as value in the pointer. So in this
  case the value in `%addr`. Note the difference between "value of `%addr`" and
  "value at address `%addr`".

- _Type information: "What is the value at the end of the pointer?"_

  In the "hello, world" example the value is the byte at the end of the
  pointer, and this byte has here the meaning of being the ASCII value of a
  character. This information is only given by the context and not stored in
  any register or whatsoever. Because we know that the instruction `movzb` is
  used to copy a single byte from the end of the pointer, zero extend it and to
  copy it into the destination register `%ch`.

  In general you have to do the bookkeeping: You have to know of how many bytes
  the dereferenced value consists of. And to know if the dereferenced value is
  a character,  signed or unsigned integer or whatsoever.  You have zero
  support from the assembly language for bookkeeping this kind of _type
  information_.

In this context we can illustrate the meaning of `(%addr)` as follows:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}
\renewcommand\MemCellWidth { 1.2 }

\DrawMemArrayOpen{0}{15}

\DrawMemLabel{0}{msg}

\begingroup
\renewcommand\PaddingMemVariable {0.05}
\DrawMemVariable[orange!90]{0}{15}{}
\par\endgroup


\DrawPointer{0}{\%addr}

\DrawByteVariable[orange!50]{0}{(\%addr)}
\DrawByteVariable[orange!50]{1}{'e'}
\DrawByteVariable[orange!50]{2}{'l'}
\DrawByteVariable[orange!50]{3}{'l'}
\DrawByteVariable[orange!50]{4}{'o'}
\DrawByteVariable[orange!50]{5}{','}
\DrawByteVariable[orange!50]{6}{' '}
\DrawByteVariable[orange!50]{7}{'w'}
\DrawByteVariable[orange!50]{8}{'o'}
\DrawByteVariable[orange!50]{9}{'r'}
\DrawByteVariable[orange!50]{10}{'l'}
\DrawByteVariable[orange!50]{11}{'d'}
\DrawByteVariable[orange!50]{12}{'!'}
\DrawByteVariable[orange!50]{13}{'\textbackslash n'}
\DrawByteVariable[orange!50]{14}{0}

\end{tikzpicture}
--------------------------------------------------------------------------------

How to know if a register is storing a pointer?
-----------------------------------------------
The assembler does not know whether a register is used as pointer. This is also
up to you. You give the register its meaning. And this meaning can change! You
have to do the bookkeeping. And you have to keep your bookkeeping up to date.

Consider this modification of the first two instructions:

---- CODE (type=s) -------------------------------------------------------------
    ldzwq   msg,    %3
    movzbq  (%3),   %3
--------------------------------------------------------------------------------

With the first instruction register `%3` could have the meaning "pointer to the
string". After the second instruction the meaning "first character of the
string".


Some personal opinion/experience
--------------------------------
When learning C/C++ understanding pointers and how to use them is the hardest
part. My rule of thumb is that every non-trivial bug in a C/C++ program is
related to pointers. The dangerous thing is this combination:

- In the C/C++ programming languages the compilers do a lot of the necessary
  bookkeeping. So compared to programming in assembler the C/C+ compilers can
  detect many bugs related to pointers. Some of the bugs that slip though this
  line of defence can be detected by additional tools.

- Slightly exaggerated but true in the quintessence: If a bug slipped through
  it is impossible to find it. You don't even know if a bug slipped through!
  Because such a bug might only once in a while cause the program to crash, or
  worse, the bug does not crash the program and just causes wrong results.

Because some bugs are detected in C/C++ it is tempting to use these languages
and underestimate the danger.

The advantage of programming in assembly is: You will never underestimate what
can go wrong! And be aware that you are learning some non-trivial concepts. Be
patient now if things don't work out the first time and try to understand the
underlying reason. This will allow you to do some solid programming in C/C++
later.

More about the assembly language: Tokens
========================================
Like I said in the video it first seems to be odd, that for example `halt` can
be used as an label. How can the assembler distinguish the meanings?

Field format of source lines
----------------------------

This is handled by the scanner during the lexical analysis.  The format of the
source lines consists of fields:

`[label] [operators] [operands]`

Tokens for mnemonics like (e.g. "addq", "halt", etc.) and pseudo operators
(e.g. ".string", ".byte", etc) are only generated from the operator field. So
in other fields they can be used as identifiers.  For example, from this code

---- CODE (file=session09/hello/lex_example.s) ---------------------------------
addq					    # some label
	addq	%0,	%12,	%0x3	    // an instruction
	.quad	4			    /* some data
					    */
--------------------------------------------------------------------------------

the scanner (you find the lexer test program in
`1_ulm_build/hello/.build/ulmas1/`) generates the following tokens:

---- SHELL(path=session09/hello/) ----------------------------------------------
1_ulm_build/hello/.build/ulmas1/xtest_lexer < lex_example.s
---------------------------------------------------------------------------------

So note that the character sequence "addq" was first detected as an identifier
(`IDENT`) and in the second case as mnemonic (`ADDQ`).

Comments
--------
You also might notice that comments are removed. And comments can be used in
different flavors:

- Single-line comments start with "#" or "//"
- "/*" begins a multi-line comment and "*/" ends a multi-line comment

Tokens recognized only in the operator field
--------------------------------------------
As specified in the grammar, a mnemonic is part of an instruction and a pseudo
operator part of a directive. 

Mnemonics
~~~~~~~~~

The mnemonics are specified by the instruction set. In the `isa.txt` for this
video these were

+ `addq`
+ `getc`
+ `halt`
+ `imulq`
+ `je`

* `jmp`
* `jne`
* `jz`
* `ldzwq`
* `movzbq`

+ `putc`
+ `subq`


Pseudo operators
~~~~~~~~~~~~~~~~

+ `.align`
+ `.byte`
+ `.comm`
+ `.data`
+ `.equ`
+ `.equiv`

* `.globl`
* `.global`
* `.lcomm`
* `.long`
* `.quad`
* `.set`

+ `.string`
+ `.text`
+ `.word`

Tokens recognized in the label or operands fields
-------------------------------------------------

Identifiers
~~~~~~~~~~~
Identifiers begin with a letter (i.e. 'A' to 'Z' and 'a' to 'z'), or underscore
'_', or a dot '.' and are optionally continued with a sequence of more letters,
underscores, dots , or decimal digits 0 to 9.

Hence "foo", ".fOo", ".fOo1", "_", "." are allowed nut not "2foo".


Empty label
~~~~~~~~~~~
You also see some tokens called `SPACE` in this example. This token gets
generated when the label field is empty. For the parser (and describing the
grammar) it is important that every line has in general a label (which can be
empty, but it exists). Otherwise white space characters get consumed by the
scanner.

Literals
~~~~~~~~

- Decimal, hexadecimal and octal constants (e.g. 12, 0x2a, 017). These constants
  are all unsigned and encoded with 64 bits.

    - Decimal literals begin with a digit "1' to '9' and optionally
      more decimal digits '0' to '9'. Decimal constants are unsigned and are
      encoded with 64 bits.

    - Octal literals begin with the digit '0' and optionally more octal digit
      '0' to '7'.  more digits. Decimal constants are unsigned and are encoded
      with 64 bits.

    - Hexadecimal digit begin with the prefix '0x' or '0X' followed by one
      or more hexadecimal digit '0' to '9', 'a' to 'f', 'A' to 'F'.

- Character constants (e.g. 'a' or '😎') can be used as integers. The value is
  determined by the ASCII code or more general the utf-8 code.
- String literals (e.g."hello, world!")
- End of line (newline character ASCII code 10)

Punctuators/Delimiters
~~~~~~~~~~~~~~~~~~~~~~

+ `+`
+ `-`
+ `*`
+ `/`
+ `%`
+ `(`

* `)`
* `:`
* `,`
* `$`
* `{`
* `}`

+ `<`
+ `>`

Some of these punctuators are used for expressions (`+`, `-`, `*`, `/`, `%`,
`(` and `)`). You also can use them for your assembly notation. But there is
a restriction:

---- BOX -----------------------------------------------------------------------
If the `'('` punctuator is used in the assembly notation then the next
punctuator has to be the `'%'` punctuator. Hence _movq (X), %Y_ would *not* be
allowed.

As the parentheses are also used in expressions this requirements makes it
easier (or in my humble opinion possible at all) to use a recursive decent
parser.

If you dislike this restriction use the brackets `'{'` and `'}'` in your
assembly notation instead.
--------------------------------------------------------------------------------

Immediate operator
~~~~~~~~~~~~~~~~~~

+ `@w0`
+ `@w1`
+ `@w2`
+ `@w3`

These operators can be used to extract a particular word from a 64-bit literal
or symbol (e.g. a label):
    
- `@w0(label)` gives the least significant word,
- ...,
- `@w3(label)` the most significant word.


More about the assembly language: Grammar
=========================================
With your definition of the assembly notation you define some part of the
grammar. Basically you define it by example and the generator extracts the
formal grammar for instructions. Your grammar rules are then embedded into
the grammar for the assembler.

Structure of an assembly program
--------------------------------
The grammar describes an assembly program as a sequence of instructions and
directives (pseudo instructions):

---- LATEX -------------------------------------------------------------------
\begin{array}{lcl}
\langle\text{compilation-unit}\rangle
    & \to
    & \langle\text{}\rangle \\
    & \to
    & \langle\text{sequence}\rangle \\
\langle\text{sequence}\rangle
    & \to
    & \langle\text{labelled-op}\rangle \\
    & \to
    & \langle\text{regular-op}\rangle \\
    & \to
    & \langle\text{sequence}\rangle \quad \langle\text{labelled-op}\rangle \\
    & \to
    & \langle\text{sequence}\rangle \quad \langle\text{regular-op}\rangle \\
\langle\text{labelled-op}\rangle
    & \to
    & \langle\text{label}\rangle \quad \langle\text{op}\rangle\\
\langle\text{regular-op}\rangle
    & \to
    & \langle\text{empty-label}\rangle \quad \langle\text{op}\rangle\\
\langle\text{op}\rangle
    & \to
    & \textbf{eol} \\
    & \to
    & \langle\text{instruction}\rangle \quad \textbf{eol}\\
    & \to
    & \langle\text{directive}\rangle \quad \textbf{eol}\\
\end{array}
--------------------------------------------------------------------------------

Instructions
------------
This is the part of the grammar that you define. The fields (e.g. `X`, `Y`, `Z`)
from the instruction format can be used in the notation. The parser accepts for
this fields then an expression.


Expressions
-----------

  ---- LATEX -------------------------------------------------------------------
  \begin{array}{lcl}
  \langle\text{expression}\rangle
    & \to
    & \langle\text{simple-expression}\rangle \\
  \langle\text{simple-expression}\rangle
    & \to
    & \langle\text{term}\rangle \\
    & \to
    & \langle\text{simple-expression}\rangle \quad \textbf{+}
	\quad \langle\text{term}\rangle \\
    & \to
    & \langle\text{simple-expression}\rangle \quad \textbf{-}
	\quad \langle\text{term}\rangle \\
  \langle\text{term}\rangle
    & \to
    & \langle\text{factor}\rangle \\
    & \to & \langle\text{term}\rangle
	\quad\textbf{*}\quad \langle\text{factor}\rangle \\
    & \to
    & \langle\text{term}\rangle \quad\textbf{/}\quad
	\langle\text{factor}\rangle \\
    & \to
    & \langle\text{term}\rangle \quad\textbf{%}\quad
	\langle\text{factor}\rangle \\
  \langle\text{factor}\rangle
    & \to
    & \langle\text{primary}\rangle \\
    & \to
    & \langle\text{unary-minus}\rangle \\
  \langle\text{unary-minus}\rangle
    & \to
    & \textbf{-} \quad \langle\text{primary}\rangle \\
  \langle\text{pimary}\rangle
    & \to
    & \langle\text{integer}\rangle \\
    & \to
    & \langle\text{identifier}\rangle \\
    & \to
    & \textbf{(} \quad \langle\text{simple-expression}\rangle \quad
	\textbf{)}\\
  \langle\text{integer}\rangle
    & \to
    & \text{decimal-constant} \\
    & \to
    & \text{hexadecimal-constant} \\
    & \to
    & \text{octal-constant} \\
    & \to
    & \text{char-constant} \\
  \langle\text{identifier}\rangle
    & \to
    & \text{ident} \\
  \end{array}
  ------------------------------------------------------------------------------

In the simplest cases an expression is just an identifier or a constant.  The
constants can be decimal, hexadecimal, octal and character constants.  Here an
example for halt instructions that all have the same exit code given by an
expression

---- CODE (file=session09/grammar/ex_expr.s) -----------------------------------
    halt    65		    // exit code as decimal constant
    halt    0x41	    // exit code as hexadecimal constant
    halt    0101	    // exit code as octal constant
    halt    'A'		    // exit code as character constant

    .equ    exit,   'A'	    /* here instead of 'A' you also could write
			       65, 0x41, 0101.
			    */
    halt    exit
--------------------------------------------------------------------------------

Directives
----------

---- LATEX -------------------------------------------------------------------
\begin{array}{lcl}
\langle\text{directive}\rangle
    & \to
    & \langle\text{text-header}\rangle \\
    & \to
    & \langle\text{data-header}\rangle \\
    & \to
    & \langle\text{bss-header}\rangle \\
    & \to
    & \langle\text{pseudo-op-data}\rangle \quad
	\langle\text{expression}\rangle \\
    & \to
    & \langle\text{pseudo-op-string}\rangle \quad
	\textbf{string-literal} \\
    & \to
    & \langle\text{pseudo-op-flag}\rangle \quad
	\langle\text{identifier}\rangle \\
    & \to
    & \langle\text{pseudo-op-def}\rangle \quad
	\langle\text{identifier}\rangle \quad
	\textbf{,} \quad
	\langle\text{expression}\rangle \\
\langle\text{text-header}\rangle
    & \to
    & \textbf{.text} \\
\langle\text{data-header}\rangle
    & \to
    & \textbf{.data} \\
\langle\text{bss-header}\rangle
    & \to
    & \textbf{.bss} \\
\langle\text{pseudo-op-string}\rangle
    & \to
    & \textbf{.string} \\
\langle\text{pseudo-op-def}\rangle
    & \to
    & \textbf{.eqiv} \\
    & \to
    & \textbf{.equ} \\
\langle\text{pseudo-op-flag}\rangle
    & \to
    & \textbf{.global} \\
    & \to
    & \textbf{.globl} \\
\langle\text{pseudo-op-data}\rangle
    & \to
    & \textbf{.align} \\
    & \to
    & \textbf{.space} \\
    & \to
    & \textbf{.byte} \\
    & \to
    & \textbf{.long} \\
    & \to
    & \textbf{.quad} \\
    & \to
    & \textbf{.word} \\
\end{array}
--------------------------------------------------------------------------------