Performance in dependence of access patterns

Content

A long long time ago, there existed computers where the access time for each cell of a random-access memory was independent from earlier accesses. This is no longer the case and in fact a non-trivial topic in which we will delve into in the coming weeks. Nonetheless, we will have a first look at this phenomenon which gives us the opportunity to see how performance can be measured in C.

The UNIX kernel maintains for each running process the CPU time and real time spent since the start of the process in clock ticks. Clock ticks are a system-dependent unit which give the actual granularity for time measurements on a system. The system call sysconf(_SC_CLK_TCK) tells the number of clock ticks per second which permits us to convert between clock ticks and seconds.

Here is a small test program demonstrating the function walltime which delivers the number of real time seconds (represented as double) since the program started:

#include <stdio.h>
#include <stdlib.h>
#include <sys/times.h>
#include <unistd.h>

/* return real time in seconds since start of the process */
double walltime() {
   static int ticks_per_second = 0;
   if (!ticks_per_second) {
      ticks_per_second = sysconf(_SC_CLK_TCK);
   }
   struct tms timebuf;
   /* times returns the number of real time ticks passed since start */
   return (double) times(&timebuf) / ticks_per_second;
}

int main() {
   double t0 = walltime();
   sleep(1);
   double t1 = walltime() - t0;
   printf("sleep(1) took actually %.4lf seconds\n", t1);
}

We use the real time here under the assumption that our process has a dedicated CPU or CPU core while it is running.

Exercise

It is interesting to see how the performance scales using variable vector lengths for powers of two in the range from 8,192 to 67,108,864. To measure the performance for each of the lengths independently from each other, you should for each length