===========================================
CBE Part 14: Dynamic Memory Management in C				[TOC]
===========================================

---- VIDEO ------------------------------
https://www.youtube.com/embed/vHFOjvmC1ww
-----------------------------------------

More about _malloc()_ and _free()_
==================================
_man malloc_ or _man free_ both will show __the same manpage__. On this manpage
you will also find references for all other functions from the standard library
for dynamic memory management.

:links: the same manpage -> https://man7.org/linux/man-pages/man3/malloc.3.html

About _calloc()_ and _realloc()_
================================

With _calloc()_ you can allocate a zero initialized block of memory. It is
declared as

---- CODE (type=c) -------------------------------------------------------------
void *calloc(size_t count, size_t size);
--------------------------------------------------------------------------------

where _count_ specifies a number of objects and _size_ the size of one object in
bytes. Hence it will try to allocate a zero initialized block with  _count *
size_ bytes. If such a block could be allocated it returns a pointer to it and
otherwise a null pointer. 

With _realloc()_ you can change the size of a previously allocated block. It is
declared as

---- CODE (type=c) -------------------------------------------------------------
void *realloc(void *ptr, size_t size);
--------------------------------------------------------------------------------

where _ptr_ can be a pointer to a previously allocated block and _size_ the new
size in bytes. If _ptr_ is a null pointer the function behaves like _malloc()_.
In case _ptr_ points to a previously allocated block the implementation of
_realloc()_ might be able to simple enlarge the size. In this case it will
return the same pointer _ptr_. But it also might be required to allocate a new
block with the desired size. Then it will copy data from the previously
allocated block to the new block and releases the block to which _ptr_ points.
It returns a pointer to the new block. If reallocation is not possible it
returns a null pointer.

Requirements for a Vector Class
===============================
Assume that we have some important algorithm from numerical linear algebra that
requires to access elements from a vector. The algorithm has some nested loops
and accessing vector elements is required in the most inner loop. Furthermore,
this algorithm is a building block for other algorithms.

For an efficient implementation it means the have to write a function that will
be used called from other functions. Therefore accessing elements has to be
fast. But it also has to be safe. If we by mistake access elements with an
invalid index (e.g. if we have some _off by one error_) it will be hard to
track the problem. Some nightmare scenario would be that accessing invalid
elements has no effect if it by coincidence has a value of zero. We then would
sometimes pass numerical test, sometimes they would fail, and sometimes we just
get a segmentation fault.

The software pattern for a vector class should therefore provide the following:

- In some _debug mode_ we have some check whether an index is correct, a so
  called _bounce check_. In this mode the numerical tests are done. 

- In some _bench mode_ or _release mode_ the code accesses data directly. No
  overhead for _bounce check_ is tolerated and also no overhead for a function
  call.

This are some special requirements from a special field of application. It
therefore leads to special design patterns. In other fields you can have other
requirements. Therefore discussions about good or bad design always should take
into account the field of application.

Example: Crude Code that We Kind of Want
========================================
Assume we simply want to have a vector $x$ with a certain dimension. Then we
want to initialize the elements and print the elements of the vector. Some
crude code that has minimal overhead that fulfills our requirements could look
like that, where the dimension is hard coded to $\text{dim} = 4$:

:import: session16/vector/xtest_vector_crude.c [fold]

It does the job:

---- SHELL (path=session16/vector) ---------------------------------------------
gcc -Wall -o xtest_vector_crude xtest_vector_crude.c
./xtest_vector
--------------------------------------------------------------------------------

And it actually does things right:

- In the declaration of members _data_ and _dim_ the _const_ qualifier is used
  so that you can not accidentally change them after initialization.
- There is a check if allocation worked. If this is not the case the program
  terminates with an error message.
- It releases the memory at the end of _main()_ before vector $x$ leaves the
  scope.
- All indices are valid because both for loops iterates indices from $0$ to
  $\text{dim} - 1$.

However it is not convenient if you have to deal with vectors in this style.
Assume you have two vectors instead of just one. You at least want to have one
function that constructs a vector, i.e. does in this case the memory allocation
including the error handling.

Also the explicit call of _free()_ for releasing the memory of a vector is not
expressive.  Having an expressive function for releasing memory of vector in
particular becomes important if you have more than on exit point.  Also, if you
have more complex data structure releasing memory might require more than just
calling _free()_. So also destructing should be bundled in a function.

Furthermore, the correct access of vector elements requires knowledge of the
data structure. I.e. you have to know that indices are valid if and only if
they are in the range from 0 to _x.dim -1_. If you have more complex data
structures this no longer might be that obvious. So we want a function that can
be used for accessing elements and in _debug mode_ it should check if a index
is valid.  If data structures become more complicated we just have to adapt a
few places.

Example: Some Vector Class in C
===============================
The following implementation is more convenient to use and more expressive. It
is also better scalable if you have more than just one vector. It uses the same
declaration for a _struct Vector_. But in addition functions for construct a
vector with allocated memory, releasing its memory. It also provides functions
for accessing elements. For checking indices it uses the macro function
_assert_ from the standard library. If you compile with _-DNDEBUG_ the
preprocessor removes _assert_. Otherwise _assert_ will terminate the program
execution with an error and an exit code unequal to zero if the assert
condition is false.

:import: session16/vector/vector.h [fold]
:import: session16/vector/vector.c [fold]
:import: session16/vector/xtest_vector.c [fold]

Here a test run:

---- SHELL (path=session16/vector) ---------------------------------------------
gcc -Wall -o xtest_vector vector.c xtest_vector.c
./xtest_vector
--------------------------------------------------------------------------------

If this implementation further gets improved for using __inline functions__ the
compiler can generate with full optimizations code from this that is equivalent
to the "crude implementation" from above:

- Calling _VectorConstruct()_ normally creates variable of type _struct Vector_
  on its stack that gets returned and then copied. This means 

  ---- CODE (type=c) -----------------------------------------------------------
  struct Vector x = VectorConstruct(4);
  ------------------------------------------------------------------------------

  would be inefficient. However, here the return value is used to initialize a
  variable of the callee that has the same type. The compiler can change the
  function call such that _x_ gets directly initialized with the return value.
  That means no additional copying is required.

  If _VectorConstruct()_ can be inlined the compiler generates code as if you
  would have written

  ---- CODE (type=c) -----------------------------------------------------------
  struct Vector x = {
      malloc(4 * sizeof(x.data)),	// allocate memory for 4 double
      4,				// dim is hard coded to 4
  };

  // check if memory could be allocated
  if (!x.data) {
      fprintf(stderr, "allocating memoy for vector with %zu elemets failed\n",
      	x.dim);
      exit(1);
  }
  ------------------------------------------------------------------------------

- In debug mode the destructor _VectorDestruct()_ also sets the member _data_
  to the null pointer. This helps to detect cases where the destructor was
  called but the object is afterwards still used.

- After inlining the expression statement

  ---- CODE (type=c) -----------------------------------------------------------
  *VectorElementPtr(&x, i) = 42 + i;
  ------------------------------------------------------------------------------

  is equivalent to
  
  ---- CODE (type=c) -----------------------------------------------------------
  assert(i < (&x)->dim);
  (&x)->data[i] = 42 + i;
  ------------------------------------------------------------------------------

  and that is equivalent to
  
  ---- CODE (type=c) -----------------------------------------------------------
  assert(i < x.dim);
  x.data[i] = 42 + i;
  ------------------------------------------------------------------------------

  If you compile with _-DNDEBUG_ it is equivalent to
   
  ---- CODE (type=c) -----------------------------------------------------------
  x.data[i] = 42 + i;
  ------------------------------------------------------------------------------

- Analogously the statement

  ---- CODE (type=c) -----------------------------------------------------------
  printf("%lf", VectorElement(vec, i));
  ------------------------------------------------------------------------------

  in the example can be treated as if you would have written

  ---- CODE (type=c) -----------------------------------------------------------
  assert(i < vec->dim);
  printf("%lf", vec->data[i]);
  ------------------------------------------------------------------------------

  or in bench mode as if you would have written

  ---- CODE (type=c) -----------------------------------------------------------
  printf("%lf", vec->data[i]);
  ------------------------------------------------------------------------------

Inlining a function call is here in particular important for accessing vector
elements. Just think about the addition overhead for calling a function and the
instructions in the function's prologue and epilogue that otherwise would be
executed for each element access.

Using this design is less error prone but still requires discipline. You have
to be sure that _VectorConstruct()_ gets called for each of you variables of
type _struct Vector_ before you use them. You also have to make sure that
_VectorDestruct()_ get called before it leaves scope.


Example: Some Vector Class in C++
=================================
Now basically the same in C++ with more syntactic sugar. But most of all: The
C++ compiler makes sure that the constructor will always be called before the
object can be used (see __RAII__), and that the destructor gets called before
the object leaves the scope.

:import: session16/vector/vector.cpp [fold]

---- SHELL (path=session16/vector) ---------------------------------------------
g++ -Wall -o vector vector.cpp
./vector
--------------------------------------------------------------------------------

:links: RAII -> https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization
        inline functions -> https://en.wikipedia.org/wiki/Inline_function