First steps with vectors in C

Content

It is important to understand in C the different concepts of storage and access. Take as example following array declaration:

double x[8];

In this example, x is an array of 8 elements of type double. Its storage is either global (if it has been declared outside of a function) or on the stack (if it has been declared inside a function as local variable which ceases to exist when the function returns). Its elements can be accessed by index, i.e. x[0] to x[7]:

double x[8];
for (size_t i = 0; i < 8; ++i) {
   x[i] = 42;
}

Actually, x is considered as a constant pointer of type double* to the begin of the storage area, and notations like x[i] are just shorthands for *(x+i) where per pointer arithmetic i times the size of a double is added to the address of x and then dereferenced by using the * operator. Consequently, pointers can be used like arrays:

double x[8];
double* y;
y = x;
for (size_t i = 0; i < 8; ++i) {
   y[i] = 42;
}

But pointers do not know anything about the size or organization of the storage area they point into. While x still remembers that the array consists of 8 elements (of 8 bytes each), y has no knowledge about this. Compare the results of the sizeof operator:

#include <stdio.h>
#include <stdlib.h>

int main() {
   double x[8];
   double* y;
   y = x;

   printf("sizeof(x) = %zu\n", sizeof(x));
   printf("sizeof(y) = %zu\n", sizeof(y));
}
theon$ gcc -Wall -o size size.c
theon$ ./size
sizeof(x) = 64
sizeof(y) = 8
theon$ 

You have the same problem when you pass an array as parameter to a function. It is passed as a simple pointer value without any information of the array's size. Consequently, we have always to maintain the number of elements using a separate variable or parameter which should be of type size_t (defined in #include <stddef.h>) which is unsigned. To add a little bit more flexibility, we do not enforce that vectors are organized consecutively in memory. (Just think about column and row vectors in a matrix, at least one kind cannot be stored as a contiguous block of cells in memory.) Hence it is useful to add a variable or parameter which specifies the distance between two subsequent elements. The distance is (in conformance to pointer arithmetic) 1, if the elements are consecutively in memory, or any other value with is then implicitly multiplied with the element size. The corresponding type in C is ptrdiff_t (also coming from <stddef.h>) which, unlike size_t, is signed.

Note that we use in the example above the “%zu” printing format for the size_t type, i.e. “z” as in “size” and “u” as in “unsigned”. The corresponding format for ptrdiff_t is “%td”, i.e. “t” as in “ptrdiff_t”.

In summary, we specify vectors by

Example for a function returning the sum of all elements of a vector x of double values:

double sum(double* x, size_t len, ptrdiff_t incr) {
   double result = 0;
   for (size_t i = 0; i < len; ++i) {
      result += x[i * incr];
   }
   return result;
}

Alternatively, we could also run through the vector using a pointer which is incremented at the end of each iteration:

double sum(double* x, size_t len, ptrdiff_t incr) {
   size_t count = len;
   double* p = x;
   double result = 0;
   while (count > 0) {
      result += *p;
      p += incr; --count;
   }
   return result;
}

While the running-pointer-style is commonly found in C string processing code and elsewhere, we prefer the array-style notation where x[i * incr] accesses the i-th element of x. This is more readable in numerical applications and readability is an important point.

Using this specification for vectors, we separate two concepts, i.e.

This separation is essential for high performance computing as this allows us

The challenge with this approach is that we must take care that the actual storage used for a vector survives as long as we have pointers pointing into it. This is our responsibility as the compiler is unable to support us here.

Exercise

Test this