=============================
First steps with vectors in C [TOC]
=============================
It is important to understand in C the different concepts
of storage and access. Take as example following array
declaration:
---- CODE (type=c) ------------------------------------------------------------
double x[8];
-------------------------------------------------------------------------------
In this example, `x` is an array of 8 elements of type `double`.
Its storage is either global (if it has been declared outside of
a function) or on the stack (if it has been declared inside a
function as local variable which ceases to exist when the
function returns). Its elements can be accessed by index,
i.e. `x[0]` to `x[7]`:
---- CODE (type=c) ------------------------------------------------------------
double x[8];
for (size_t i = 0; i < 8; ++i) {
x[i] = 42;
}
-------------------------------------------------------------------------------
Actually, `x` is considered as a constant pointer of type `double*` to the
begin of the storage area, and notations like `x[i]` are just
shorthands for `*(x+i)` where per pointer arithmetic `i` times the size
of a `double` is added to the address of `x` and then dereferenced by
using the `*` operator. Consequently, pointers can be used like
arrays:
---- CODE (type=c) ------------------------------------------------------------
double x[8];
double* y;
y = x;
for (size_t i = 0; i < 8; ++i) {
y[i] = 42;
}
-------------------------------------------------------------------------------
But pointers do not know anything about the size or organization
of the storage area they point into. While `x` still remembers that
the array consists of 8 elements (of 8 bytes each), `y` has no knowledge
about this. Compare the results of the `sizeof` operator:
:import: session01/size.c
---- SHELL (path=session01,hostname=theon) ----
gcc -Wall -o size size.c
./size
-----------------------------------------------
You have the same problem when you pass an array as parameter to a
function. It is passed as a simple pointer value without any information of
the array's size. Consequently, we have always to maintain the number
of elements using a separate variable or parameter which should be of
type `size_t` (defined in `#include `) which is unsigned. To
add a little bit more flexibility, we do not enforce that vectors are
organized consecutively in memory. (Just think about column and row
vectors in a matrix, at least one kind cannot be stored as a contiguous
block of cells in memory.) Hence it is useful to add a variable or
parameter which specifies the distance between two subsequent elements.
The distance is (in conformance to pointer arithmetic) 1, if the
elements are consecutively in memory, or any other value with is then
implicitly multiplied with the element size. The corresponding type in
C is `ptrdiff_t` (also coming from ``) which, unlike
`size_t`, is signed.
Note that we use in the example above the “%zu” printing format
for the `size_t` type, i.e. “z” as in “si*z*e” and “u” as in “*u*nsigned”.
The corresponding format for `ptrdiff_t` is “%td”, i.e. “t” as in
“p*t*rdiff_t”.
In summary, we specify vectors by
* a pointer to the first element,
* the number of elements,
* and the distance between two consecutive elements of the vector.
Example for a function returning the sum of all elements of
a vector `x` of `double` values:
---- CODE (type=c) ------------------------------------------------------------
double sum(double* x, size_t len, ptrdiff_t incr) {
double result = 0;
for (size_t i = 0; i < len; ++i) {
result += x[i * incr];
}
return result;
}
-------------------------------------------------------------------------------
Alternatively, we could also run through the vector using
a pointer which is incremented at the end of each iteration:
---- CODE (type=c) ------------------------------------------------------------
double sum(double* x, size_t len, ptrdiff_t incr) {
size_t count = len;
double* p = x;
double result = 0;
while (count > 0) {
result += *p;
p += incr; --count;
}
return result;
}
-------------------------------------------------------------------------------
While the running-pointer-style is commonly found in C string processing
code and elsewhere, we prefer the array-style notation where `x[i * incr]`
accesses the _i_-th element of `x`. This is more readable in numerical
applications and readability is an important point.
Using this specification for vectors, we separate two concepts, i.e.
* the abstract concept of a vector, and
* the storage management.
This separation is essential for high performance computing as
this allows us
* to reuse existing storage in various ways (thereby avoiding
the cost of copying data around just for seeing it through
another type),
* to design the storage layout and the access patterns such that
we achieve the best performance, and
* to keeping the algorithms independent from the storage layout.
The challenge with this approach is that we must take care that
the actual storage used for a vector survives as long as we
have pointers pointing into it. This is our responsibility
as the compiler is unable to support us here.
Exercise
========
- A function `init_vector` shall initialize a vector. The vector
is specified by a pointer to its first element, the number of
elements, and the distance between two consecutive elements of
the vector. The first element of the vector is to be initialized
to 1, the second to 2 etc.
- The function `print_vector` shall print a vector which is
specified similarly with three parameters.
Test this
- by defining a global array consisting of 8 elements,
- by initializing the entire array and printing it,
- by using the same global array for two consecutive
vectors of length 4 which are each individually
initialized and printed, and
- by using the same global array for two interleaving
vectors of length 4 which are likewise initialized
and printed.
:navigate: up -> doc:index
next -> doc:session01/page02