===============
Quiz: Quicksort
===============

In this quiz you are supposed to write an assembly program with some more
complex functions. The program should first print a string, then sort the
characters of the string and finally print the sorted result:

---- CODE (type=txt) -----------------------------------------------------------
theon$ quicksort
hello, world!
 !,dehllloorw
--------------------------------------------------------------------------------

For sorting you should write a function that implements the __Quicksort__
algorithm.  However, this assignment is not about the theoretical background of
sorting algorithms. The purpose is to practise the calling conventions that
where introduced in this session. Furthermore, you should get some idea how a
compiler can translate C code into assembly code (with all optimization flags
turned of, i.e. `-O0`). For that sake you will find a complete C implementation
for the assigned problem and also a variant rewritten in __spaghetti code__.
Translate the spaghetti code statement by statement into assembly code.  

On `theon` submit your program with the command

`submit hpc quiz12 quicksort.s`

Reading assignment
==================
This interview with a Microsoft developer is more or less on-topic __Russell
Hadley: The Route to C++ Code Optimization__. He works on the C++ compiler team
and gives a not to technical talk some overview how compilers work, apply
optimizations and finally generate assembly code. I think a lot will make sense
to you after this session, or even sound familiar to you by know. For example,
at minute 1:30 we mentions how nice C++ code gets translated into spaghetti
code).  It's also interesting to see that the guy is not using the Visual
Studio IDE but Emacs (besides Vim the only legit alternative).

I also stumbled over this article __The Mistakes I Made As a Beginner
Programmer__. This is not directly related to the topic of Session 10 but about
programming in general (it does not matter that he uses examples from
JavaScript).  It's great and you should read it. I love and totally agree to
quotes like "For example, if you are not consistent with your indentation and
capitalization, you should simply lose your license to code." or "Expert
programmers love errors. Newbies hate them.".

About the subscript notation used in the C code examples
========================================================
In the functions you will see the notation `A[i]` which is equivalent to
`*(A+i)`.  The variable `A` is declared with type `char *`, hence it is a
pointer to a character

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth {0.9}

\DrawMemArrayOpen{0}{10}
\DrawMemVariable[orange!50]{1}{10}{"hello, world!"}
\DrawHack{0}{$0$}

\begingroup
\renewcommand\PaddingMemVariable {0.01}
\renewcommand\PaddingMemVariable {0.05}
\renewcommand\PointerDisplaceY {2}
\par\endgroup


\DrawMemArrayOpen{15}{25}
\begingroup
\renewcommand\PaddingMemVariable {0.01}
\DrawMemVariable[red!50]{16}{24}{variable A}
\renewcommand\PaddingMemVariable {0.05}
\renewcommand\PointerDisplaceY {2}
\par\endgroup

\DrawMemPointer{18}{1}

\end{tikzpicture}
--------------------------------------------------------------------------------


With `*(A+i)` the character with address `A+i` is dereferenced:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth {1.5}

\DrawMemArrayOpen{0}{6}
\DrawMemCellContent{1}{*(A+0)}
\DrawMemCellContent{2}{*(A+1)}
\DrawMemCellContent{3}{*(A+2)}
\DrawMemCellContent{4}{*(A+3)}
\DrawMemCellContent{5}{*(A+4)}
\DrawMemCellContent{6}{$\dots$}


\begingroup
\renewcommand\PaddingMemVariable {0.01}
\renewcommand\PaddingMemVariable {0.05}
\renewcommand\PointerDisplaceY {2}
\par\endgroup


\DrawMemArrayOpen{10}{11}
\DrawHack{0}{$0$}
\begingroup
\renewcommand\PaddingMemVariable {0.01}
\DrawMemVariable[red!50]{10}{12}{variable A}
\renewcommand\PaddingMemVariable {0.05}
\renewcommand\PointerDisplaceY {2}
\DrawAnnotateMemCellAbove{10}{for bookkeeping: variable  A has type char *}
\par\endgroup

\DrawMemPointer{10}{1}

\end{tikzpicture}
--------------------------------------------------------------------------------


And `A[i]` is just a more readable notation for that:

---- TIKZ ----------------------------------------------------------------------
\begin{tikzpicture}
\input{memory.tex}

\renewcommand\MemCellWidth {1.5}

\DrawMemArrayOpen{0}{6}
\DrawHack{0}{$0$}

\DrawMemCellContent{1}{A[0]}
\DrawMemCellContent{2}{A[1]}
\DrawMemCellContent{3}{A[2]}
\DrawMemCellContent{4}{A[3]}
\DrawMemCellContent{5}{A[4]}
\DrawMemCellContent{6}{$\dots$}


\begingroup
\renewcommand\PaddingMemVariable {0.01}
\renewcommand\PaddingMemVariable {0.05}
\renewcommand\PointerDisplaceY {2}
\par\endgroup


\DrawMemArrayOpen{10}{11}
\begingroup
\renewcommand\PaddingMemVariable {0.01}
\DrawMemVariable[red!50]{10}{12}{variable A}
\renewcommand\PaddingMemVariable {0.05}
\renewcommand\PointerDisplaceY {2}
\DrawAnnotateMemCellAbove{10}{for bookkeeping: variable  A has type char *}
\par\endgroup

\DrawMemPointer{10}{1}

\end{tikzpicture}
--------------------------------------------------------------------------------

In the assembly code the you can use `movzbq (%X, %Y), %Z` for fetching and
`movb %X, (%Y, %Z)` for storing the i-the character:

- If `%X` contains the value of `A`, i.e. the address of the first character, 
  and `%Y` contains the index `i` then `movzbq (%X, %Y), %Z` fetches `A[i]` into
  register `%Z`.

- If `%Y` contains the value of `A`, i.e. the address of the first character, 
  and `%Z` contains the index `i` then `movb %X, (%Y, %Z)` stores the least
  significant byte from `%X` in the memory location `A[i]`.

When we discuss _pointer arithmetic in C_ you will see why it makes sense that
the fetch and store operations in the instruction set have further variants
with a scaling parameter. Here the secret in short: When you have a pointer `A`
then in general `A+i` is _not_ the address of `A` plus `i`. Instead it is the
address of `A` plus `i` times the size of the elements. For example, if `A`
points to quad word integers, let's say it has type `int64_t *` or type
`uint64_t *`, then "`A + i`" means "address stored in `A` plus `8*i`". In that
case, if the value of variable `A` is in `%X` and the value of `i` in `%Y` then
`movq (%X, %Y, 8), %Z` can be used for fetching a quad word into `%Z`. And vice
versa, with `movq %X, (%Y, %Z, 8)` an instruction is provided that can be used
for storing an array element.


Example for sorting a string with bubble sort
=============================================
This program is basically solving the problem with the __bubble sort__
algorithm instead of the Quicksort. For translating this manually into assembly
code the same approach is followed. Some C code first get rewritten in a
primitive variant that then can be translated statement by statement (and you
will find here all ingredients needed for the quicksort assignment).

This is the initial C code:

:import: session10/func/bubblesort.c [fold]

As we are reimplementing `puts` and `strlen` from the C library (with a
slightly different function signature) you get some annoying warnings when you
compile that. For ignoring this you can use the option
`-Wno-builtin-declaration-mismatch`:

---- SHELL (path=session10/func) -----------------------------------------------
gcc -Wno-builtin-declaration-mismatch -o bubblesort bubblesort.c
bubblesort
--------------------------------------------------------------------------------

The bubble sort example with C spaghetti code
---------------------------------------------
Procedure `bublesort` has 3 local variables. It is conveninet that you can
declare local variables within compound statements. Hovever, technically it is
eqivalent to declare all local variables at the beginning of the function body:

+ ---- CODE --------------------------------------------------------------------
  void
  bubblesort(char *A, uint64_t n)
  {
      uint64_t swapped;
      do {
          swapped = 0;
          for (uint64_t i=1; i<n; ++i) {
              if (A[i-1] > A[i]) {
                  char tmp = A[i];
                  A[i] = A[i-1];
                  A[i-1] = tmp;
                  swapped = 1;
              }
          }
          --n;
      } while (swapped);
  }
  ------------------------------------------------------------------------------

+ ---- CODE --------------------------------------------------------------------
  void
  bubblesort(char *A, uint64_t n)
  {
      uint64_t	swapped;
      uint64_t	i;
      char	tmp;
      do {
          swapped = 0;
          for (i=1; i<n; ++i) {
              if (A[i-1] > A[i]) {
                  tmp = A[i];
                  A[i] = A[i-1];
                  A[i-1] = tmp;
                  swapped = 1;
              }
          }
          --n;
      } while (swapped);
  }
  ------------------------------------------------------------------------------

Next the nice to read loop-statements and the if-statements can be rewritten
with more primitive statements that are easy to translate into chunks of
assembly code:

+ ---- CODE --------------------------------------------------------------------
  void
  bubblesort(char *A, uint64_t n)
  {
      uint64_t	swapped;
      uint64_t	i;
      char	tmp;
      do {
          swapped = 0;
          for (i=1; i<n; ++i) {
              if (A[i-1] > A[i]) {
                  tmp = A[i];
                  A[i] = A[i-1];
                  A[i-1] = tmp;
                  swapped = 1;
              }
          }
          --n;
      } while (swapped);
  }
  ------------------------------------------------------------------------------

+ ---- CODE --------------------------------------------------------------------
  void
  bubblesort(char *A, uint64_t n)
  {
      uint64_t swapped;
      uint64_t i;
      char tmp;
  
  bubblesort_do:
      swapped = 0;
      i = 1;
  bubblesort_for:
      if (i >= n)
          goto bubblesort_endfor;
      if (A[i-1] <= A[i])
          goto bubblesort_endif;
      tmp = A[i];
      A[i] = A[i-1];
      A[i-1] = tmp;
      swapped = 1;
  bubblesort_endif:
      i = i + 1;
      goto bubblesort_for;
      n = n - 1;
  bubblesort_endfor:
      if (swapped)
          goto bubblesort_do;
  }
  ------------------------------------------------------------------------------

This transformation of readable code into spaghetti code gets done by starting
with the most outer control flow statement, describing its meaning with a
flowchart and then expressing it with spaghetti code. You then can continue by
repeating the procedure for nested controlstructures etc. 

So we begin with the do-statement in the bubble sort code which can be
described and rewritten as follows:

+ ---- CODE --------------------------------------------------------------------
  do {
      /* ... */
  } while (swapped);
  ------------------------------------------------------------------------------

+ ---- TIKZ --------------------------------------------------------------------
  \begin{tikzpicture}
  \input{flowchart.tex}

  \renewcommand\BoxWidth {5}
  \renewcommand\BoxHeight {1.2}
  \renewcommand\BoxDistance {1.8}
  \SetMargin{0.01}{0.01}{1}{2}
  \PutStatement{0}{swapped = 0}
  \PutStatement{1}{ ... ;}
  \PutJump{2}{swapped != 0}
  \PutFakeStatement{3}{}

  \PutLabel{0}{bubblesort\_do}

  \AddPath{0}{1}
  \AddPath{1}{2}
  \AddPath{2}{3}

  \AddCondJumpPath{2}{0}

  \end{tikzpicture}
  ------------------------------------------------------------------------------

+ ---- CODE --------------------------------------------------------------------
  bubblesort_do:
      /* ... */
      if (swapped)
          goto bubblesort_do;
  ------------------------------------------------------------------------------

The next inner control flow statement is the for-loop:

+ ---- CODE --------------------------------------------------------------------
  for (i=1; i<n; ++i) {
    /* ... */
  }
  ------------------------------------------------------------------------------

+ ---- TIKZ --------------------------------------------------------------------
  \begin{tikzpicture}
  \input{flowchart.tex}

  \renewcommand\BoxWidth {5}
  \renewcommand\BoxHeight {1.2}
  \renewcommand\BoxDistance {1.8}
  \SetMargin{0.01}{0.01}{1}{5}
  \PutStatement{0}{i = 1;}
  \PutJump{1}{ i >= n}
  \PutStatement{2}{ /* ... */ }
  \PutStatement{3}{ i = i + 1;}
  \PutJump{4}{}
  \PutStatement{5}{/* empty statement*/ ;}

  \PutLabel{1}{bubblesort\_for}
  \PutLabel{5}{bubblesort\_endfor}

  \AddPath{0}{1}
  \AddPath{1}{2}
  \AddPath{2}{3}
  \AddPath{3}{4}

  \AddCondJumpPath{1}{5}
  \AddJumpPathLeft{4}{1}


  \end{tikzpicture}
  ------------------------------------------------------------------------------

And the most inner control stament can be rewritten as follows

+ ---- CODE --------------------------------------------------------------------
  i = 1;
  bubblesort_for:
      if (i >= n)
          goto bubblesort_endfor;
      /* ... */
      i = i + 1;
  bubblesort_endfor:
      ;
  ------------------------------------------------------------------------------

Finally we reached the most inner control statement:

+ ---- CODE --------------------------------------------------------------------
  if (A[i-1] > A[i]) {
    /* ... */
  }
  ------------------------------------------------------------------------------

+ ---- TIKZ --------------------------------------------------------------------
  \begin{tikzpicture}
  \input{flowchart.tex}

  \renewcommand\BoxWidth {5}
  \renewcommand\BoxHeight {1.2}
  \renewcommand\BoxDistance {1.8}
  \SetMargin{0.01}{0.01}{1}{2}
  \PutJump{0}{A[i-1] <= A[i]}
  \PutStatement{1}{ /* ... */ }
  \PutStatement{2}{ /* empty statement */ ;}

  \PutLabel{2}{bubblesort\_endif}

  \AddPath{0}{1}
  \AddPath{1}{2}

  \AddCondJumpPath{0}{2}
  \end{tikzpicture}
  ------------------------------------------------------------------------------

+ ---- CODE --------------------------------------------------------------------
      if (A[i-1] <= A[i])
          goto bubblesort_endif;
      /* ... */
  bubblesort_endif:
      ;
  ------------------------------------------------------------------------------

Why is it called spaghetti code?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I think you already got the idea. But nonetheless, have a look at the
corresponding flowchart:

+ ---- TIKZ --------------------------------------------------------------------
  \begin{tikzpicture}
  \input{flowchart.tex}

  \renewcommand\BoxWidth {5}
  \renewcommand\BoxHeight {1.2}
  \renewcommand\BoxDistance {1.8}
  \SetMargin{3}{1}{1}{5}

  \PutStatement{0}{ swapped = 0; }
  \PutStatement{1}{ i = 1; }
  \PutJump{2}{i >= n}
  \PutJump{3}{A[i-1] <= A[i]}
  \PutStatement{4}{tmp = A[i];}
  \PutStatement{5}{A[i] = A[i-1];}
  \PutStatement{6}{A[i-1] = tmp;}
  \PutStatement{7}{swapped = 1;}
  \PutStatement{8}{i = i + 1;}
  \PutJump{9}{}
  \PutStatement{10}{n = n - 1;}
  \PutJump{11}{swapped != 0}
  \PutFakeStatement{12}{}

  \PutLabel{0}{bubblesort\_do}
  \PutLabel{2}{bubblesort\_for}
  \PutLabel{8}{bubblesort\_endif}
  \PutLabel{10}{bubblesort\_endfor}

  \AddPath{0}{1}
  \AddPath{1}{2}
  \AddPath{2}{3}
  \AddPath{3}{4}
  \AddPath{4}{5}
  \AddPath{5}{6}
  \AddPath{6}{7}
  \AddPath{7}{8}
  \AddPath{8}{9}
  \AddPath{10}{11}
  \AddPath{11}{12}

  \AddCondJumpPath{2}{10}

  \begingroup
  \renewcommand\JumpPathDist {2}
  \AddCondJumpPathLeft{3}{8}
  \par\endgroup

  \begingroup
  \renewcommand\JumpPathDist {2.5}
  \AddJumpPathLeft{9}{2}
  \par\endgroup

  \begingroup
  \renewcommand\JumpPathDist {3}
  \AddCondJumpPathLeft{11}{0}
  \par\endgroup

  \end{tikzpicture}
  ------------------------------------------------------------------------------

+ ---- CODE --------------------------------------------------------------------
  void
  bubblesort(char *A, uint64_t n)
  {
      uint64_t swapped;
      uint64_t i;
      char tmp;
  
  bubblesort_do:
      swapped = 0;
      i = 1;
  bubblesort_for:
      if (i >= n)
          goto bubblesort_endfor;
      if (A[i-1] <= A[i])
          goto bubblesort_endif;
      tmp = A[i];
      A[i] = A[i-1];
      A[i-1] = tmp;
      swapped = 1;
  bubblesort_endif:
      i = i + 1;
      goto bubblesort_for;
      n = n - 1;
  bubblesort_endfor:
      if (swapped)
          goto bubblesort_do;
  }
  ------------------------------------------------------------------------------

Complete spaghetti code
~~~~~~~~~~~~~~~~~~~~~~~
Here the complete beast:

:import: session10/func/bubblesort_spaghetti.c [fold]

---- SHELL (path=session10/func) -----------------------------------------------
gcc -Wno-builtin-declaration-mismatch -o bubblesort bubblesort_spaghetti.c
bubblesort
--------------------------------------------------------------------------------


Implementing the bubble sort example in assembly for the ULM
------------------------------------------------------------

:import: session10/func/bubblesort.s [fold]

---- SHELL (path=session10/func) -----------------------------------------------
ulmas -o bubblesort bubblesort.s
ulm bubblesort
--------------------------------------------------------------------------------

Example in C for sorting a string with Quicksort
================================================

:import: session10/func/quicksort.c [fold]

Compile and run:

---- SHELL (path=session10/func) -----------------------------------------------
gcc -Wno-builtin-declaration-mismatch -o quicksort quicksort.c
quicksort
--------------------------------------------------------------------------------

Some C spaghetti code for the Quicksort example
===============================================

:import: session10/func/quicksort_spaghetti.c [fold]

Compile and run:

---- SHELL (path=session10/func) -----------------------------------------------
gcc -Wno-builtin-declaration-mismatch -o quicksort quicksort_spaghetti.c
quicksort
--------------------------------------------------------------------------------


Skeleton for the Quicksort assembly program
===========================================

For the quiz you can use the following skeleton. Function `quicksort` contains
the labels from the corresponding C spaghetti code and the C statements in
comments:

:import: session10/func/quicksort.s [fold]

In its current form it translates but function `quicksort` just returns.
Therefore you see string printed twice:

---- SHELL (path=session10/func) -----------------------------------------------
ulmas -o quicksort quicksort.s
ulm quicksort
--------------------------------------------------------------------------------

:links: Quicksort -> https://en.wikipedia.org/wiki/Quicksort
	bubble sort -> https://en.wikipedia.org/wiki/Bubble_sort
	spaghetti code -> https://en.wikipedia.org/wiki/Spaghetti_code
	The Mistakes I Made As a Beginner Programmer -> https://medium.com/edge-coders/the-mistakes-i-made-as-a-beginner-programmer-ac8b3e54c312
	Russell Hadley: The Route to C\+\+ Code Optimization -> https://channel9.msdn.com/Shows/Going+Deep/Russell-Hadley-The-Route-to-C-Code-Optimization