=============================================================
CBE Pt.6: Expressions in C (and How to Read Production Rules)		[TOC]
=============================================================

---- VIDEO ------------------------------
https://www.youtube.com/embed/s-KNG2XyAoQ
-----------------------------------------


Structure of a C program
========================

C programs are supposed to describe the text, data and BSS segment of the
assembly code that the compiler generates. The text segment is used for
function definitions, the data segment for initialized global variables and the
BSS segment for uninitialized global variables (which are therefore zero
initialized).

The C grammar states that a C program is a sequence of declarations, for
instance type declarations, variable declarations and function definitions:

---- LATEX ---------------------------------------------------------------------
\begin{array}{rcl}
\langle\text{translation-unit}\rangle
    & \to &
    \langle\text{top-level-declaration}\rangle
    \\
    & \to &
    \langle\text{translation-unit}\rangle\;
    \langle\text{top-level-declaration}\rangle
    \\
\langle\text{top-level-declaration}\rangle
    & \to &
    \langle\text{declaration}\rangle
    \\
    & \to &
    \langle\text{function-definition}\rangle
    \\
\langle\text{declaration}\rangle
    & \to &
    \langle\text{declaration-specifiers}\rangle\;
    \langle\text{initialized-declarator-list}\rangle\;
    \textbf{;}
    \\
\langle\text{declaration-specifiers}\rangle
    & \to &
    \langle\text{storage-class--specifier}\rangle\;
    \langle\text{declaration-specifiers}\rangle_\text{opt}
    \\
    & \to &
    \langle\text{type--specifier}\rangle\;
    \langle\text{declaration-specifiers}\rangle_\text{opt}
    \\
    & \to &
    \langle\text{type-qualifier}\rangle\;
    \langle\text{declaration-specifiers}\rangle_\text{opt}
    \\
    & \to &
    \langle\text{function-specifier}\rangle\;
    \langle\text{declaration-specifiers}\rangle_\text{opt}
    \\
\end{array}
--------------------------------------------------------------------------------

Recall what the terms _declaration_ and _definition_ have in common and in what
respect they differ. Any definition is also a declaration, or in other words, a
definition is a special case of a declaration. The difference is that
definitions have an impact on the (assembly) code generation whereas
declarations just on the symbol tables within the compiler when it parse the
source code.

Expressions are used for initializing variables (hence are needed to describe
an initialized declaration) and can used for the second simplest kind of
statement. The simplest statement is actually an empty statement, in the
examples we consider _expression-statements_:

---- LATEX ---------------------------------------------------------------------
\begin{array}{rcl}
\langle\text{statement}\rangle
    & \to &
    \langle\text{empty-statement}\rangle\;
    \\
    & \to &
    \langle\text{expression-statement}\rangle\;
    \\
    & \to &
    \dots
    \\
\langle\text{empty-statement}\rangle
    & \to &
    \textbf{;}\;
    \\
\langle\text{expression-statement}\rangle
    & \to &
    \langle\text{expression}\rangle\;
    \textbf{;}\;
    \\
\end{array}
--------------------------------------------------------------------------------

For making the descriptions vivid we need examples. And for good examples, we
need variables and functions. So consequently we need to deal with declarations
in C which is (as mentioned before) the part of the language that can be a pain
in the @$$. So for now we will cover that part by examples, like for instance

---- CODE (type=c) -------------------------------------------------------------
int i;			//  declaration (and definition) of a integer variable i

int foo(int a, int b);	/*  declaration (not a definition) of a function 'foo'
			    which returns an integer and has to parameters of
			    type integer.
			*/
--------------------------------------------------------------------------------

---- SHELL (path=session08/,hide) ----------------------------------------------
rm -rf hpc0_cprog_page3
cp -r -P /home/numerik/pub/cprog hpc0_cprog_page3
--------------------------------------------------------------------------------

What is important about expressions?
====================================
You can combine expressions with certain operators into more complex
expressions. The most elementary expressions are literals and variables using
operators like `+` or `*` you can build an expression for the sum or product
respectively. Hence you need to know what operators are available in total. But
you also have to know the precedence of the operators. For example in C the
expressions `a+b*c` and `a+(b*c)` are identical, so there is no need for the
parenthesis.

You also have to know the semantics of an operator. For `+` it seems to be
clear that it is used for adding things. However, details might not be so
obvious if one of these things is a pointer variable (which is a case we will
cover later). In C we also have operators like `++` which can be used as prefix
operator, e.g. `++a`, or postfix operator, e.g. `a++`. In some cases both will
do the same, in other cases its making a significant difference.

Knowing about the semantics also means to know the requirements for using an
operator. For example, the assignment operator `=` can only be used if the
left-hand side refers to a memory location that can be overwritten with the
result of the right-hand side. We already dealt with the terms _l-value_ and
_r-value_ for expressing this restriction.

When dealing with floating-point arithmetic (we will not deal with it here) it
is also important to know about the associativity of operators. Because of
round-off errors `a + (b + c)` and `(a + b) + c` will in general not give the
same result. So what is the meaning if you write `a + b + c` in your code
skipping the parenthesis. Here the answer: In C the `+` operator is left
associative, hence `(a + b) + c` and `a + b + c` are identical.

Operator precedence and associativity
=====================================

In the description for the __ULM assembly language__ you saw how the precedence
and associativity of operators can be described by production rules. However,
the C language is much richer in provided operators and production rules are
therefore not always the description of choice. The table below is showing the
relevant details that you can extract from the grammar.  The semantics for
(some of) the operators will be explained in subsequent sections on this page
by examples.

:links: ULM assembly language -> doc:session08/page02

+---------------+-------------------+---------------------------------------+---------------+
| Precedence	|   Associativity   |	Operators			    |	Meaning	    |
+---------------+-------------------+---------------------------------------+---------------+
| 16 (highest)	|  left 	    | Variables				    | Literals and  |
|		|		    |					    | Unary postfix |
| 		|      		    | `++` (increment)			    |		    |
|		|		    |					    |		    |
|		|		    | `--` (decrement)			    |		    |
|		|		    |					    |		    |
|		|		    | `f()` (function call)		    |		    |
|		|		    |					    |		    |
|		|		    | `a[i]` (index operator)		    |		    |
|		|		    |					    |		    |
|		|		    | `p->member` (indirect member access)  |		    |
|		|		    |					    |		    |
|		|		    | `s.member` (direct member access)	    |		    |
+---------------+-------------------+---------------------------------------+---------------+
| 15		|  right	    |	`*` (dereference operator)  	    | Unary prefix  |
|		|		    |					    |		    |
|		|		    |	`&` (address operator)	 	    |		    |
|		|		    |					    |		    |
|		|		    |	`-` (unary minus)		    |		    |
|		|		    |					    |		    |
|		|		    |	`+` (unary plus)		    |		    |
|		|		    |					    |		    |
|		|		    |	`!` (logical not)		    |		    |
|		|		    |					    |		    |
|		|		    |	`~` (bitwise not)		    |		    |
|		|		    |					    |		    |
|		|		    |	`++` (increment)		    |		    |
|		|		    |					    |		    |
|		|		    |	`--` (decrement)		    |		    |
|		|		    |					    |		    |
|		|		    |	`sizeof`			    |		    |
+---------------+-------------------+---------------------------------------+---------------+
| 14		|   right	    |	`(`_type_`)` (type cast)	    |		    |
+---------------+-------------------+---------------------------------------+---------------+
| 13		|   left	    |	`*` (multiply)			    |Multiplicative |
|		|		    |					    |		    |
|		|		    |	`/` (divide)			    |		    |
|		|		    |					    |		    |
|		|		    |	`%` (modulo)			    |		    |
+---------------+-------------------+---------------------------------------+---------------+
| 12		|   left	    |	`+` (add)			    |	Additive    |
|		|		    |					    |		    |
|		|		    |	`-` (subtract)			    |		    |
+---------------+-------------------+---------------------------------------+---------------+
| 11		|   left	    |	`<<` (left shift)		    |	Bit shift   |
|		|		    |					    |		    |
|		|		    |	`>>` (right shift)		    |		    |
+---------------+-------------------+---------------------------------------+---------------+
| 10		|   left	    |	`<`  (less)			    |	Relation    |
|		|		    |					    |		    |
|		|		    |	`>`  (greater)	     		    |		    |
|		|		    |					    |		    |
|		|		    |	`<=` (less equal)		    |		    |
|		|		    |					    |		    |
|		|		    |	`>=` (greater equal)		    |		    |
+---------------+-------------------+---------------------------------------+---------------+
| 9		|   left	    |	`==`				    |	Equality    |
|		|		    |					    |		    |
|  		|		    |	`!=`				    |	Inequality  |
+---------------+-------------------+---------------------------------------+---------------+
| 8		|   left	    |	`&`				    |	Bitwise and |
+---------------+-------------------+---------------------------------------+---------------+
| 7		|   left	    |	`^`				    |	Bitwise	    |
|		|		    |					    |	exclusive   |
|		|		    |					    |	or	    |
+---------------+-------------------+---------------------------------------+---------------+
| 6		|   left	    |	`|`				    |	Bitwise or  |
+---------------+-------------------+---------------------------------------+---------------+
| 5		|   left	    |	`&&`				    |	Logical and |
+---------------+-------------------+---------------------------------------+---------------+
| 4		|   left	    |	`||`				    |	Logical or  |
+---------------+-------------------+---------------------------------------+---------------+
| 3		|  right	    |   `?` in conjunction with `:`	    |	Conditional |
+---------------+-------------------+---------------------------------------+---------------+
| 2		|   right	    |	`=`,				    |	Assignment  |
|		|		    |					    |		    |
|		|		    |	`+=`				    |		    |
|		|		    |					    |		    |
|		|		    |	`-=`				    |		    |
|		|		    |					    |		    |
|		|		    |	`*=`				    |		    |
|		|		    |		    			    |		    |
|		|		    |	`/=`	    			    |		    |
|		|		    |		    			    |		    |
|		|		    |	`%=`	    			    |		    |
|		|		    |		    			    |		    |
|		|		    |	`&=`	    			    |		    |
|		|		    |		    			    |		    |
|		|		    |	`^=`	    			    |		    |
|		|		    |		    			    |		    |
|		|		    |	`|=`	    			    |		    |
|		|		    |		    			    |		    |
|		|		    |	`<<=`	    			    |		    |
|		|		    |		    			    |		    |
|		|		    |	`>>=`	    			    |		    |
+---------------+-------------------+---------------------------------------+---------------+
| 1 (lowest)	|   left	    |	`,`	    			    |	List	    |
+---------------+-------------------+---------------------------------------+---------------+


Some simple examples for expression statements
==============================================

Assume the we have some integer variable `a`, e.g. declared by

---- CODE (type=c) -------------------------------------------------------------
int a;
--------------------------------------------------------------------------------

Because variables as well as literals are expressions 

---- CODE (type=c) -------------------------------------------------------------
a;
--------------------------------------------------------------------------------

and

---- CODE (type=c) -------------------------------------------------------------
42;
--------------------------------------------------------------------------------

are both legal expression statements.  However, both statements have no effect.
That means if the compiler ignores these statements you would not notice any
difference.

Simple expressions that are not assignment expressions but do have an effect
are function and procedure calls. Assume `foo` is declared as procedure, e.g.

---- CODE (type=c) -------------------------------------------------------------
void foo(void);	    // no return value, no parameters
--------------------------------------------------------------------------------

then

---- CODE (type=c) -------------------------------------------------------------
foo();
--------------------------------------------------------------------------------

is an expression statement which (possibly) does have an effect. So the compiler
will generate code for calling `foo`.

The fact that function calls are expressions means that return values of a
function can be used to combine more complex expressions. Assume function `foo`
does return an integer, e.g. let it be declared as

---- CODE (type=c) -------------------------------------------------------------
int foo(void);	    // return value of type 'int', no parameters
--------------------------------------------------------------------------------

then

---- CODE (type=c) -------------------------------------------------------------
a = foo() + 42;
--------------------------------------------------------------------------------

is an expression statement where the expression is an assignment expression
where the right-hand side is an additive expression consisting of a function
call and a literal. You see that verbal description soon becomes cumbersome but
you see the syntax tree in you mind, right?

More about assignment expressions
---------------------------------

Besides the assignment operator `=` there other operators like `+=`, `*=` etc.
They all have in common that the l-value gets update by a certain operation.
For example

---- CODE (type=c) -------------------------------------------------------------
a += foo() + 42;
--------------------------------------------------------------------------------

is equivalent to

---- CODE (type=c) -------------------------------------------------------------
a = a + foo() + 42;
--------------------------------------------------------------------------------

Actually the ULM compiler is internally doing exactly this kind of
transformation.

For incrementing a variable by a constant this comes in handy. For example

---- CODE (type=c) -------------------------------------------------------------
a += 42;
--------------------------------------------------------------------------------

would increment variable `a` by `42`. In conjunction with loops you often need
to increment a variable by one, or decrement a variable by one. You certainly
can do this with these assignment operators:

---- CODE (type=c) -------------------------------------------------------------
a += 1;		// increment a by 1
a -= 1;		// decrement a by 1
--------------------------------------------------------------------------------

However, for these cases the prefix operators `++` and `--` are even more
convenient:

---- CODE (type=c) -------------------------------------------------------------
++a;		// increment a by 1
--a;		// decrement a by 1
--------------------------------------------------------------------------------

Below you will find some examples for pointing out the difference between the
prefix and postfix operators (and my arguments why I always prefer the prefix
operator even if the postfix operator has the same effect).

More complex assignments
------------------------
An assignment expression is an expression. So it can be used as an expression,
for example as the r-value of another assignment expression:

---- CODE (type=c) -------------------------------------------------------------
int a, b;

a = b = 42;
--------------------------------------------------------------------------------

Because of the right associativity of the `=` operator the expression `a = b =
42` is equivalent to `a = (b = 42)` and not to `(a = b) = 42` which would
result in a undefined value of `b`. Of course all this holds for all kind of
assignment operators:

---- CODE (type=c) -------------------------------------------------------------
int a, b;

a = b = 42;		// assign 42 to b, and assign b to a
a += b += 2;		// increment b by 2, and increment a by b
--------------------------------------------------------------------------------

Can you predict the values of `a` and `b` after each statement? Do you remember
that programming can not be learned by merely reading?  So check it out:

---- CODE (file=session08/hpc0_cprog_page3/src/xassign.c) ----------------------
#include <stdio.h>

int
main()
{
    int a, b;

    a = b = 42;
    printf("After 'a = b = %d;' we have a = %d, b = %d\n", 42, a, b);
    a += b += 2;
    printf("After 'a += b += %d;' we have a = %d, b = %d\n", 2, a, b);
}
--------------------------------------------------------------------------------

---- SHELL (path=session08/hpc0_cprog_page3, fold) -----------------------------
make
--------------------------------------------------------------------------------

---- SHELL (path=session08/hpc0_cprog_page3) -----------------------------------
./build_ucc/xassign
./build_gcc/xassign
--------------------------------------------------------------------------------

Exercise
--------
Think of other examples with similar expression statements and write a simple
example for demonstrating the effects. For example using the operators `-` for
subtraction, `*` for multiplication, `/` for division and `%` for modulo.

Prefix and postfix operators `++` and `--`
==========================================
I am sometimes asked why I am using the prefix operator in for-loops. For
example, I would write

---- CODE (type-c) -----------------------------------------------------------
for (int i=0; i < n; ++i) {		// <- could be my code
    /* ... */
}
------------------------------------------------------------------------------

but _never_ this

---- CODE (type-c) -----------------------------------------------------------
for (int i=0; i < n; i++) {		// <- I certainly did not write that!!
    /* ... */
}
------------------------------------------------------------------------------

First of all, in this case it would not make any difference whether I wrote
`++i` or `i++`. The compiler would the same code. But I have two rules for using
the `++` and `--` operators. And I would violate my first rule:

- If the postfix operator and the prefix operator both have the same effect
  then use the prefix operator.
- If you don't know the difference between the postfix operator and the prefix
  operator then _never_ use the postfix operator.


What the operators have in common
---------------------------------
For expression statements that just contain increment expression there is no
difference. For example `++a` and `a++` will be treated by the compiler as if
you would have written `a = a + 1`:

---- CODE (file=session08/hpc0_cprog_page3/src/xincr.c) ------------------------
#include <stdio.h>

int
main()
{
    int a;

    a = 42;
    printf("After 'a = %d;' we have a = %d\n", 42, a);

    ++a;
    printf("After '++a;' we have a = %d\n", a);

    a++;
    printf("After 'a++;' we have a = %d\n", a);
}
--------------------------------------------------------------------------------

---- SHELL (path=session08/hpc0_cprog_page3, fold) -----------------------------
make
--------------------------------------------------------------------------------

---- SHELL (path=session08/hpc0_cprog_page3) -----------------------------------
./build_ucc/xincr
./build_gcc/xincr
--------------------------------------------------------------------------------

Cases where it matters
----------------------
If `a++` is part of a more complex expression like `b = a++` then its original
value is used for evaluating the expression and the increment happens
afterwards. In other words the expression `b = a++` behaves like `b = a` with a
subsequent side effect that `a` also gets incremented afterwards.

If you see an `++a` expression you can substitute it with `(a = a + 1)` and
explain the effect. So if you think `b = ++a` think of it as `b = (a = a + 1)`.
In this form it is already explaining itself: the variable `a` gets incremented
first (because `a = a + 1` is inside these parentheses) and then the
incremented value is used for evaluating the remaining expression.

---- CODE (file=session08/hpc0_cprog_page3/src/xincr2.c) -----------------------
#include <stdio.h>

int
main()
{
    int a, b;

    a = 42;
    printf("After 'a = %d;' we have a = %d\n", 42, a);

    b = a++;
    printf("After 'b = a++;' we have a = %d, b = %d\n", a, b);

    b = ++a;
    printf("After 'b = ++a;' we have a = %d, b = %d\n", a, b);
}
--------------------------------------------------------------------------------

---- SHELL (path=session08/hpc0_cprog_page3, fold) -----------------------------
make
--------------------------------------------------------------------------------

---- SHELL (path=session08/hpc0_cprog_page3) -----------------------------------
./build_ucc/xincr2
./build_gcc/xincr2
--------------------------------------------------------------------------------


Some fun with expression lists
==============================
The comma-operator has the lowest precedence. So an expression like `a = 2, b =
4` is equivalent to `(a = 2), (b = 4)`. And this is probably its main
application, i.e. initialize several variables with a single statement. Like in
this for-loop:

---- CODE (file=session08/hpc0_cprog_page3/src/xexpr_list.c) -------------------
#include <stdio.h>

int
main()
{
    unsigned long i, factorial; 

    for (i = 0, factorial = 1; i < 10; ++i) {
	printf("%2lu! = %10lu\n", i, factorial *= i > 0 ? i : 1);
    }
}
--------------------------------------------------------------------------------

So the expression list is here `i = 0, factorial = 1` used in the initial
clause of the for-loop statement (and statements will be covered next).  But
certainly the conditional expression `factorial *= i > 0 ? i : 1` is here the 
actual star of the show. But let me first show you that this program simply
prints the factorial from 0 to 10:

---- SHELL (path=session08/hpc0_cprog_page3, fold) -----------------------------
make
--------------------------------------------------------------------------------

---- SHELL (path=session08/hpc0_cprog_page3) -----------------------------------
./build_ucc/xexpr_list
./build_gcc/xexpr_list
--------------------------------------------------------------------------------

Because of the precedences the expression

---- CODE (type=c) -------------------------------------------------------------
factorial *= i > 0 ? i : 1
--------------------------------------------------------------------------------

is equivalent to

---- CODE (type=c) -------------------------------------------------------------
factorial *= (i > 0) ? i : 1
--------------------------------------------------------------------------------

And the assignment operator could be expanded to

---- CODE (type=c) -------------------------------------------------------------
factorial = factorial * ( (i > 0) ? i : 1 )
--------------------------------------------------------------------------------

Hence we essential need to understand the meaning of the conditional expression
`(i > 0) ? i : 1`. More general expressed it has the form

---- CODE (type=c) -------------------------------------------------------------
cond ? valTrue : valFalse 
--------------------------------------------------------------------------------

and evaluates to `valTrue` if `cond` is not zero, and otherwise to `valFalse`.