C By Example Part 5: Integer Types in C and Type Aliases

In Session 6 (Page 3) you learned the fundamental concept behind variables in C. For example that every variable has a type. But in the example only the type int was used. Here you find a list of all the other integer types that are available in C. You will also see that for having a guaranteed size (e.g. a certain number of bits) so called type aliases have to be used. These are declared in headers of the standard library.

Integer types

Here you find a list with all signed and unsigned types thar are part of the language. This list also contains the specifiers formatted printing.

The tables are basically taken from here and extended for the concrete sizes used by ucc and thegcc installation on theon. The size is given in bytes and for both, the ULM architecture and the Intel architecture, a byte consists of 8 bits. The C standard just guarantees that a byte has at least 8 bits.

Also note that the list are still not complete. For example, the types like signed long long int can also be written as int long signed long, long int long signed, etc. Personally I anyway prefer the shortest form for expressing a type, e.g. long long instead of signed long long int and unsigned long long instead of unsigned long long int.

Type char

The C standard specifies that type char has exactly one byte. However, the C standard does not specify whether it is signed or unsigned. In many cases this does not matter. For example if it used to represent ASCII characters which require just 7 bits for encoding (and a byte has at least 8 bits so the sign bit is not used by the encoding). If signedness is relevant you explicitly have to use the types signed char or unsigned char.

Signed Integer Types

Type	Size in bytes	gcc on theon	ucc	Format spacifier
signed char	\(1\)	\(1\)	\(1\)	for character printing %c decimal: %hhd or %hhi octal: %hho hex: %hhx
short short int signed short signed short int signed short	\(\geq 2\)	\(2\)	\(2\)	decimal: %hd or %hi octal: %ho hex: %hx
int signed signed int	\(\geq 2\)	\(4\)	\(2\)	decimal: %d or %i octal: %o hex: %x
long long int signed long signed long int	\(\geq 4\)	\(8\)	\(4\)	decimal: %ld or %li octal: %lo hex: %lx
long long long long int signed long long signed long long int	\(\geq 8\)	\(8\)	\(8\)	decimal: %lld or %lli octal: %llo hex: %llx

Unsigned Integer Types

Type	Size in bytes	gcc on theon	ucc	Format spacifier
unsigned char	\(1\)	\(1\)	\(1\)	for character printing %c decimal: %hhu octal: %hho hex: %hhx
unsigned short unsigned short int	\(\geq 2\)	\(2\)	\(2\)	decimal: %hu octal: %ho hex: %hx
unsigned unsigned int	\(\geq 2\)	\(4\)	\(2\)	decimal: %u octal: %o hex: %x
unsigned long unsigned long int	\(\geq 4\)	\(8\)	\(4\)	decimal: %lu octal: %lo hex: %lx
unsigned long long unsigned long long int	\(\geq 8\)	\(8\)	\(8\)	decimal: %llu octal: %llo hex: &llx

Standardized type aliases for integer types

Through type aliases the C standard library provides further integer types. For example, the unsigned integer type size_t and signed integer type ptrdiff_t. The exact size of these types depends on the memory model supported by the compiler. size_t is the type returned by the sizeof operator. The size of size_t is such that it can be used to store the maximum size of a theoretically possible object of any type (including array). ptrdiff_t is the signed integer type of the result of subtracting two pointers.

Other examples for specified type aliases are fixed width integer types (e.g. uint8_t, uint16_t, uint32_t, uint64_t, int8_t, int16_t, int32_t, int64_t) and the type alias ``bool`.

These type alias are declared in certain header files. This page summarizes what header you need to include. Furthermore, it shows how these type alias are declared in the (incomplte) standard library for ucc.

Types size_t and ptrdiff_t

The declaration of these types can be imported by including the standard header stddef.h. The actual size of both types basically depends on the address space of the architecture for which the compiler generates code.

As ucc only supports code generation for the ULM (which has a 64 bit virtual memory space) these types always have 8 bytes. As gcc on the other hand supports different architectures it depends on the installation and compiler flags.

stddef.h from the ULM standard library

Using a typedef declaration, size_t and ptrdiff_t are declared as a type alias for uint64_t and int64_t respectively. As you will see below, these in turn are type aliases for unsigned long long and long long.

/home/numerik/pub/ulmcc/include/stddef.h

#ifndef ULMCLIB_STDDEF_H
#define ULMCLIB_STDDEF_H

#include <stdint.h>

typedef uint64_t size_t;
typedef int64_t ptrdiff_t;

#endif // ULMCLIB_STDDEF_H

Example

Let's check the default sizes used by ucc and gcc with the following test:

session07/hpc0_cprog_page8/xprintf_size_t.c

#include <stddef.h>
#include <stdio.h>

int
main()
{
    printf("sizeof(size_t) = %zu\n", sizeof(size_t));
    printf("sizeof(ptrdiff_t) = %zu\n", sizeof(size_t));
}

theon$ mkdir -p build_gcc
theon$ gcc  -o build_gcc/xprintf_size_t xprintf_size_t.c
theon$ mkdir -p build_ucc
theon$ gcc  -o build_ucc/xprintf_size_t xprintf_size_t.c
theon$

theon$ ./build_ucc/xprintf_size_t
sizeof(size_t) = 8
sizeof(ptrdiff_t) = 8
theon$ ./build_gcc/xprintf_size_t
sizeof(size_t) = 8
sizeof(ptrdiff_t) = 8
theon$

Invoking gcc with the -m32 option it generates code for the 32-bit i386 architecture that Intel introduced in the mid 80s (of the last century). And as Intels current hardware actually still is backwards compatible this code actual runs on theon. However, for demonstrating this we have to use an old gcc installation (who would have thought we would ever need that again):

theon$ /opt/ulm/athenry/bin/gcc -m32 xprintf_size_t.c -o build_gcc/xprintf_size_t-32
theon$ ./build_gcc/xprintf_size_t-32
sizeof(size_t) = 4
sizeof(ptrdiff_t) = 4
theon$

Format specifiers

For printing variables of type size_t the correct format specifiers are %zu (decimal), %zx (hexadecimal) and %zo (octal). For variables of type ptrdiff_t correspondingly %td (decimal), %tx (hexadecimal) and %to (octal). In this example the format specifiers contain an optional width (e.g. %20zu has the width 20) so that we get the numbers printed nicely in columns:

session07/hpc0_cprog_page8/xprintf_fmt_zu_td.c

#include <stddef.h>
#include <stdio.h>

int
main()
{
    size_t s = -1;      // so that we get the largest value ;-)
    ptrdiff_t p = -1;

    printf("s = %20zu (hex: %16zx, oct: %23zo)\n", s, s, s);
    printf("p = %20td (hex: %16tx, oct: %23to)\n", p, p, p);
}

theon$ mkdir -p build_gcc
theon$ gcc  -o build_gcc/xprintf_fmt_zu_td xprintf_fmt_zu_td.c
theon$ mkdir -p build_ucc
theon$ gcc  -o build_ucc/xprintf_fmt_zu_td xprintf_fmt_zu_td.c
theon$

theon$ ./build_ucc/xprintf_fmt_zu_td
s = 18446744073709551615 (hex: ffffffffffffffff, oct:  1777777777777777777777)
p =                   -1 (hex: ffffffffffffffff, oct:  1777777777777777777777)
theon$ ./build_gcc/xprintf_fmt_zu_td
s = 18446744073709551615 (hex: ffffffffffffffff, oct:  1777777777777777777777)
p =                   -1 (hex: ffffffffffffffff, oct:  1777777777777777777777)
theon$

Fixed width integer types

The fixed width integer types are declared in stdint.h. Here you see the declarations in the ULM library:

/home/numerik/pub/ulmcc/include/stdint.h

#ifndef ULMCLIB_STDINT_H
#define ULMCLIB_STDINT_H

typedef signed char int8_t;
typedef unsigned char uint8_t;

typedef int int16_t;
typedef unsigned uint16_t;

typedef long int32_t;
typedef unsigned long uint32_t;

typedef long long int64_t;
typedef unsigned long long uint64_t;

#endif // ULMCLIB_STDINT_H

Type bool and literals true and false

Since C99 the C language provides an additional integer type _Bool which is guaranteed to be large enough to store 0 and 1. But it is at least one byte because that's supposed to be the smallest size unit. Again, just another example how the C standard is as vague as possible (so that various platforms can be supported) and consistent at the same time.

In stdbool.h for the ULM library the type bool is declared as an alias for _Bool. By including it you also get integer literals false and true with values and 1 respectively:

/home/numerik/pub/ulmcc/include/stdbool.h

#ifndef ULMCLIB_STDBOOL_H
#define ULMCLIB_STDBOOL_H

typedef _Bool bool;

enum { false = 0, true = 1 };

#endif // ULMCLIB_STDBOOL_H

Note that the C standard actually requires that bool should be a macro and not a typedef (here some discussion why). Well, currently possible as the preprocessor for the ULM compiler is very limited.