CBE Pt.6: Expressions in C (and How to Read Production Rules)
Structure of a C program
C programs are supposed to describe the text, data and BSS segment of the assembly code that the compiler generates. The text segment is used for function definitions, the data segment for initialized global variables and the BSS segment for uninitialized global variables (which are therefore zero initialized).
The C grammar states that a C program is a sequence of declarations, for instance type declarations, variable declarations and function definitions:
\[\begin{array}{rcl}\langle\text{translation-unit}\rangle & \to & \langle\text{top-level-declaration}\rangle \\ & \to & \langle\text{translation-unit}\rangle\; \langle\text{top-level-declaration}\rangle \\\langle\text{top-level-declaration}\rangle & \to & \langle\text{declaration}\rangle \\ & \to & \langle\text{function-definition}\rangle \\\langle\text{declaration}\rangle & \to & \langle\text{declaration-specifiers}\rangle\; \langle\text{initialized-declarator-list}\rangle\; \textbf{;} \\\langle\text{declaration-specifiers}\rangle & \to & \langle\text{storage-class--specifier}\rangle\; \langle\text{declaration-specifiers}\rangle_\text{opt} \\ & \to & \langle\text{type--specifier}\rangle\; \langle\text{declaration-specifiers}\rangle_\text{opt} \\ & \to & \langle\text{type-qualifier}\rangle\; \langle\text{declaration-specifiers}\rangle_\text{opt} \\ & \to & \langle\text{function-specifier}\rangle\; \langle\text{declaration-specifiers}\rangle_\text{opt} \\\end{array}\]Recall what the terms declaration and definition have in common and in what respect they differ. Any definition is also a declaration, or in other words, a definition is a special case of a declaration. The difference is that definitions have an impact on the (assembly) code generation whereas declarations just on the symbol tables within the compiler when it parse the source code.
Expressions are used for initializing variables (hence are needed to describe an initialized declaration) and can used for the second simplest kind of statement. The simplest statement is actually an empty statement, in the examples we consider expression-statements:
\[\begin{array}{rcl}\langle\text{statement}\rangle & \to & \langle\text{empty-statement}\rangle\; \\ & \to & \langle\text{expression-statement}\rangle\; \\ & \to & \dots \\\langle\text{empty-statement}\rangle & \to & \textbf{;}\; \\\langle\text{expression-statement}\rangle & \to & \langle\text{expression}\rangle\; \textbf{;}\; \\\end{array}\]For making the descriptions vivid we need examples. And for good examples, we need variables and functions. So consequently we need to deal with declarations in C which is (as mentioned before) the part of the language that can be a pain in the @$$. So for now we will cover that part by examples, like for instance
1 2 3 4 5 6 | int i; // declaration (and definition) of a integer variable i
int foo(int a, int b); /* declaration (not a definition) of a function 'foo'
which returns an integer and has to parameters of
type integer.
*/
|
What is important about expressions?
You can combine expressions with certain operators into more complex expressions. The most elementary expressions are literals and variables using operators like + or * you can build an expression for the sum or product respectively. Hence you need to know what operators are available in total. But you also have to know the precedence of the operators. For example in C the expressions a+b*c and a+(b*c) are identical, so there is no need for the parenthesis.
You also have to know the semantics of an operator. For + it seems to be clear that it is used for adding things. However, details might not be so obvious if one of these things is a pointer variable (which is a case we will cover later). In C we also have operators like ++ which can be used as prefix operator, e.g. ++a, or postfix operator, e.g. a++. In some cases both will do the same, in other cases its making a significant difference.
Knowing about the semantics also means to know the requirements for using an operator. For example, the assignment operator = can only be used if the left-hand side refers to a memory location that can be overwritten with the result of the right-hand side. We already dealt with the terms l-value and r-value for expressing this restriction.
When dealing with floating-point arithmetic (we will not deal with it here) it is also important to know about the associativity of operators. Because of round-off errors a + (b + c) and (a + b) + c will in general not give the same result. So what is the meaning if you write a + b + c in your code skipping the parenthesis. Here the answer: In C the + operator is left associative, hence (a + b) + c and a + b + c are identical.
Operator precedence and associativity
In the description for the ULM assembly language you saw how the precedence and associativity of operators can be described by production rules. However, the C language is much richer in provided operators and production rules are therefore not always the description of choice. The table below is showing the relevant details that you can extract from the grammar. The semantics for (some of) the operators will be explained in subsequent sections on this page by examples.
Precedence |
Associativity |
Operators |
Meaning |
16 (highest) |
left |
Variables ++ (increment) -- (decrement) f() (function call) a[i] (index operator) p->member (indirect member access) s.member (direct member access) |
Literals and Unary postfix |
15 |
right |
* (dereference operator) & (address operator) - (unary minus) + (unary plus) ! (logical not) ~ (bitwise not) ++ (increment) -- (decrement) sizeof |
Unary prefix |
14 |
right |
(type) (type cast) |
|
13 |
left |
* (multiply) / (divide) % (modulo) |
Multiplicative |
12 |
left |
+ (add) - (subtract) |
Additive |
11 |
left |
<< (left shift) >> (right shift) |
Bit shift |
10 |
left |
< (less) > (greater) <= (less equal) >= (greater equal) |
Relation |
9 |
left |
== != |
Equality Inequality |
8 |
left |
& |
Bitwise and |
7 |
left |
^ |
Bitwise exclusive or |
6 |
left |
| |
Bitwise or |
5 |
left |
&& |
Logical and |
4 |
left |
|| |
Logical or |
3 |
right |
? in conjunction with : |
Conditional |
2 |
right |
=, += -= *= /= %= &= ^= |= <<= >>= |
Assignment |
1 (lowest) |
left |
, |
List |
Some simple examples for expression statements
Assume the we have some integer variable a, e.g. declared by
1 | int a;
|
Because variables as well as literals are expressions
1 | a;
|
and
1 | 42;
|
are both legal expression statements. However, both statements have no effect. That means if the compiler ignores these statements you would not notice any difference.
Simple expressions that are not assignment expressions but do have an effect are function and procedure calls. Assume foo is declared as procedure, e.g.
1 | void foo(void); // no return value, no parameters
|
then
1 | foo();
|
is an expression statement which (possibly) does have an effect. So the compiler will generate code for calling foo.
The fact that function calls are expressions means that return values of a function can be used to combine more complex expressions. Assume function foo does return an integer, e.g. let it be declared as
1 | int foo(void); // return value of type 'int', no parameters
|
then
1 | a = foo() + 42;
|
is an expression statement where the expression is an assignment expression where the right-hand side is an additive expression consisting of a function call and a literal. You see that verbal description soon becomes cumbersome but you see the syntax tree in you mind, right?
More about assignment expressions
Besides the assignment operator = there other operators like +=, *= etc. They all have in common that the l-value gets update by a certain operation. For example
1 | a += foo() + 42;
|
is equivalent to
1 | a = a + foo() + 42;
|
Actually the ULM compiler is internally doing exactly this kind of transformation.
For incrementing a variable by a constant this comes in handy. For example
1 | a += 42;
|
would increment variable a by 42. In conjunction with loops you often need to increment a variable by one, or decrement a variable by one. You certainly can do this with these assignment operators:
1 2 | a += 1; // increment a by 1
a -= 1; // decrement a by 1
|
However, for these cases the prefix operators ++ and -- are even more convenient:
1 2 | ++a; // increment a by 1
--a; // decrement a by 1
|
Below you will find some examples for pointing out the difference between the prefix and postfix operators (and my arguments why I always prefer the prefix operator even if the postfix operator has the same effect).
More complex assignments
An assignment expression is an expression. So it can be used as an expression, for example as the r-value of another assignment expression:
1 2 3 | int a, b;
a = b = 42;
|
Because of the right associativity of the = operator the expression a = b = 42 is equivalent to a = (b = 42) and not to (a = b) = 42 which would result in a undefined value of b. Of course all this holds for all kind of assignment operators:
1 2 3 4 | int a, b;
a = b = 42; // assign 42 to b, and assign b to a
a += b += 2; // increment b by 2, and increment a by b
|
Can you predict the values of a and b after each statement? Do you remember that programming can not be learned by merely reading? So check it out:
1 2 3 4 5 6 7 8 9 10 11 12 | #include <stdio.h>
int
main()
{
int a, b;
a = b = 42;
printf("After 'a = b = %d;' we have a = %d, b = %d\n", 42, a, b);
a += b += 2;
printf("After 'a += b += %d;' we have a = %d, b = %d\n", 2, a, b);
}
|
theon$ make building configuration 'gcc' make -C src/ Config=gcc -f ../config/Makefile.template make[1]: Entering directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' mkdir -p ../deps_gcc Updating dependencies for xassign.c mkdir -p ../build_gcc gcc -o ../build_gcc/xassign xassign.c ../deps_gcc/xassign.d make[1]: Leaving directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' building configuration 'ucc' make -C src/ Config=ucc -f ../config/Makefile.template make[1]: Entering directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' mkdir -p ../deps_ucc Updating dependencies for xassign.c mkdir -p ../build_ucc ucc -o ../build_ucc/xassign xassign.c /home/numerik/pub/ulmcc/lib/libulm.a ../deps_ucc/xassign.d make[1]: Leaving directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' theon$
theon$ ./build_ucc/xassign GGGGGGGGGGGGGGG\ \ \ \ \ theon$ ./build_gcc/xassign After 'a = b = 42;' we have a = 42, b = 42 After 'a += b += 2;' we have a = 86, b = 44 theon$
Exercise
Think of other examples with similar expression statements and write a simple example for demonstrating the effects. For example using the operators - for subtraction, * for multiplication, / for division and % for modulo.
Prefix and postfix operators ++ and --
I am sometimes asked why I am using the prefix operator in for-loops. For example, I would write
1 2 3 | for (int i=0; i < n; ++i) { // <- could be my code
/* ... */
}
|
but never this
1 2 3 | for (int i=0; i < n; i++) { // <- I certainly did not write that!!
/* ... */
}
|
First of all, in this case it would not make any difference whether I wrote ++i or i++. The compiler would the same code. But I have two rules for using the ++ and -- operators. And I would violate my first rule:
-
If the postfix operator and the prefix operator both have the same effect then use the prefix operator.
-
If you don't know the difference between the postfix operator and the prefix operator then never use the postfix operator.
What the operators have in common
For expression statements that just contain increment expression there is no difference. For example ++a and a++ will be treated by the compiler as if you would have written a = a + 1:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #include <stdio.h>
int
main()
{
int a;
a = 42;
printf("After 'a = %d;' we have a = %d\n", 42, a);
++a;
printf("After '++a;' we have a = %d\n", a);
a++;
printf("After 'a++;' we have a = %d\n", a);
}
|
theon$ make building configuration 'gcc' make -C src/ Config=gcc -f ../config/Makefile.template make[1]: Entering directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' Updating dependencies for xincr.c gcc -o ../build_gcc/xincr xincr.c ../deps_gcc/xassign.d ../deps_gcc/xincr.d make[1]: Leaving directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' building configuration 'ucc' make -C src/ Config=ucc -f ../config/Makefile.template make[1]: Entering directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' Updating dependencies for xincr.c ucc -o ../build_ucc/xincr xincr.c /home/numerik/pub/ulmcc/lib/libulm.a ../deps_ucc/xassign.d ../deps_ucc/xincr.d make[1]: Leaving directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' theon$
theon$ ./build_ucc/xincr kkkkkkkkkkkE E theon$ ./build_gcc/xincr After 'a = 42;' we have a = 42 After '++a;' we have a = 43 After 'a++;' we have a = 44 theon$
Cases where it matters
If a++ is part of a more complex expression like b = a++ then its original value is used for evaluating the expression and the increment happens afterwards. In other words the expression b = a++ behaves like b = a with a subsequent side effect that a also gets incremented afterwards.
If you see an ++a expression you can substitute it with (a = a + 1) and explain the effect. So if you think b = ++a think of it as b = (a = a + 1). In this form it is already explaining itself: the variable a gets incremented first (because a = a + 1 is inside these parentheses) and then the incremented value is used for evaluating the remaining expression.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #include <stdio.h>
int
main()
{
int a, b;
a = 42;
printf("After 'a = %d;' we have a = %d\n", 42, a);
b = a++;
printf("After 'b = a++;' we have a = %d, b = %d\n", a, b);
b = ++a;
printf("After 'b = ++a;' we have a = %d, b = %d\n", a, b);
}
|
theon$ make building configuration 'gcc' make -C src/ Config=gcc -f ../config/Makefile.template make[1]: Entering directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' Updating dependencies for xincr2.c gcc -o ../build_gcc/xincr2 xincr2.c ../deps_gcc/xassign.d ../deps_gcc/xincr.d ../deps_gcc/xincr2.d make[1]: Leaving directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' building configuration 'ucc' make -C src/ Config=ucc -f ../config/Makefile.template make[1]: Entering directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' Updating dependencies for xincr2.c ucc -o ../build_ucc/xincr2 xincr2.c /home/numerik/pub/ulmcc/lib/libulm.a ../deps_ucc/xassign.d ../deps_ucc/xincr.d ../deps_ucc/xincr2.d make[1]: Leaving directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' theon$
theon$ ./build_ucc/xincr2 XXXXXXXXXXXw w w w w w theon$ ./build_gcc/xincr2 After 'a = 42;' we have a = 42 After 'b = a++;' we have a = 43, b = 42 After 'b = ++a;' we have a = 44, b = 44 theon$
Some fun with expression lists
The comma-operator has the lowest precedence. So an expression like a = 2, b = 4 is equivalent to (a = 2), (b = 4). And this is probably its main application, i.e. initialize several variables with a single statement. Like in this for-loop:
1 2 3 4 5 6 7 8 9 10 11 | #include <stdio.h>
int
main()
{
unsigned long i, factorial;
for (i = 0, factorial = 1; i < 10; ++i) {
printf("%2lu! = %10lu\n", i, factorial *= i > 0 ? i : 1);
}
}
|
So the expression list is here i = 0, factorial = 1 used in the initial clause of the for-loop statement (and statements will be covered next). But certainly the conditional expression factorial *= i > 0 ? i : 1 is here the actual star of the show. But let me first show you that this program simply prints the factorial from 0 to 10:
theon$ make building configuration 'gcc' make -C src/ Config=gcc -f ../config/Makefile.template make[1]: Entering directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' Updating dependencies for xexpr_list.c gcc -o ../build_gcc/xexpr_list xexpr_list.c ../deps_gcc/xassign.d ../deps_gcc/xexpr_list.d ../deps_gcc/xincr.d ../deps_gcc/xincr2.d make[1]: Leaving directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' building configuration 'ucc' make -C src/ Config=ucc -f ../config/Makefile.template make[1]: Entering directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' Updating dependencies for xexpr_list.c ucc -o ../build_ucc/xexpr_list xexpr_list.c /home/numerik/pub/ulmcc/lib/libulm.a ../deps_ucc/xassign.d ../deps_ucc/xexpr_list.d ../deps_ucc/xincr.d ../deps_ucc/xincr2.d make[1]: Leaving directory '/home/numerik/hpc/ss22/hpc0/session08/hpc0_cprog_page3/src' theon$
theon$ ./build_ucc/xexpr_list uuuuuuuuu yyyyyyyy yyyyyyyy VVVVVVV MMMMMM ||||| theon$ ./build_gcc/xexpr_list 0! = 1 1! = 1 2! = 2 3! = 6 4! = 24 5! = 120 6! = 720 7! = 5040 8! = 40320 9! = 362880 theon$
Because of the precedences the expression
1 | factorial *= i > 0 ? i : 1
|
is equivalent to
1 | factorial *= (i > 0) ? i : 1
|
And the assignment operator could be expanded to
1 | factorial = factorial * ( (i > 0) ? i : 1 )
|
Hence we essential need to understand the meaning of the conditional expression (i > 0) ? i : 1. More general expressed it has the form
1 | cond ? valTrue : valFalse
|
and evaluates to valTrue if cond is not zero, and otherwise to valFalse.