============================================================= CBE Pt.6: Expressions in C (and How to Read Production Rules) [TOC] ============================================================= ---- VIDEO ------------------------------ https://www.youtube.com/embed/s-KNG2XyAoQ ----------------------------------------- Structure of a C program ======================== C programs are supposed to describe the text, data and BSS segment of the assembly code that the compiler generates. The text segment is used for function definitions, the data segment for initialized global variables and the BSS segment for uninitialized global variables (which are therefore zero initialized). The C grammar states that a C program is a sequence of declarations, for instance type declarations, variable declarations and function definitions: ---- LATEX --------------------------------------------------------------------- \begin{array}{rcl} \langle\text{translation-unit}\rangle & \to & \langle\text{top-level-declaration}\rangle \\ & \to & \langle\text{translation-unit}\rangle\; \langle\text{top-level-declaration}\rangle \\ \langle\text{top-level-declaration}\rangle & \to & \langle\text{declaration}\rangle \\ & \to & \langle\text{function-definition}\rangle \\ \langle\text{declaration}\rangle & \to & \langle\text{declaration-specifiers}\rangle\; \langle\text{initialized-declarator-list}\rangle\; \textbf{;} \\ \langle\text{declaration-specifiers}\rangle & \to & \langle\text{storage-class--specifier}\rangle\; \langle\text{declaration-specifiers}\rangle_\text{opt} \\ & \to & \langle\text{type--specifier}\rangle\; \langle\text{declaration-specifiers}\rangle_\text{opt} \\ & \to & \langle\text{type-qualifier}\rangle\; \langle\text{declaration-specifiers}\rangle_\text{opt} \\ & \to & \langle\text{function-specifier}\rangle\; \langle\text{declaration-specifiers}\rangle_\text{opt} \\ \end{array} -------------------------------------------------------------------------------- Recall what the terms _declaration_ and _definition_ have in common and in what respect they differ. Any definition is also a declaration, or in other words, a definition is a special case of a declaration. The difference is that definitions have an impact on the (assembly) code generation whereas declarations just on the symbol tables within the compiler when it parse the source code. Expressions are used for initializing variables (hence are needed to describe an initialized declaration) and can used for the second simplest kind of statement. The simplest statement is actually an empty statement, in the examples we consider _expression-statements_: ---- LATEX --------------------------------------------------------------------- \begin{array}{rcl} \langle\text{statement}\rangle & \to & \langle\text{empty-statement}\rangle\; \\ & \to & \langle\text{expression-statement}\rangle\; \\ & \to & \dots \\ \langle\text{empty-statement}\rangle & \to & \textbf{;}\; \\ \langle\text{expression-statement}\rangle & \to & \langle\text{expression}\rangle\; \textbf{;}\; \\ \end{array} -------------------------------------------------------------------------------- For making the descriptions vivid we need examples. And for good examples, we need variables and functions. So consequently we need to deal with declarations in C which is (as mentioned before) the part of the language that can be a pain in the @$$. So for now we will cover that part by examples, like for instance ---- CODE (type=c) ------------------------------------------------------------- int i; // declaration (and definition) of a integer variable i int foo(int a, int b); /* declaration (not a definition) of a function 'foo' which returns an integer and has to parameters of type integer. */ -------------------------------------------------------------------------------- ---- SHELL (path=session08/,hide) ---------------------------------------------- rm -rf hpc0_cprog_page3 cp -r -P /home/numerik/pub/cprog hpc0_cprog_page3 -------------------------------------------------------------------------------- What is important about expressions? ==================================== You can combine expressions with certain operators into more complex expressions. The most elementary expressions are literals and variables using operators like `+` or `*` you can build an expression for the sum or product respectively. Hence you need to know what operators are available in total. But you also have to know the precedence of the operators. For example in C the expressions `a+b*c` and `a+(b*c)` are identical, so there is no need for the parenthesis. You also have to know the semantics of an operator. For `+` it seems to be clear that it is used for adding things. However, details might not be so obvious if one of these things is a pointer variable (which is a case we will cover later). In C we also have operators like `++` which can be used as prefix operator, e.g. `++a`, or postfix operator, e.g. `a++`. In some cases both will do the same, in other cases its making a significant difference. Knowing about the semantics also means to know the requirements for using an operator. For example, the assignment operator `=` can only be used if the left-hand side refers to a memory location that can be overwritten with the result of the right-hand side. We already dealt with the terms _l-value_ and _r-value_ for expressing this restriction. When dealing with floating-point arithmetic (we will not deal with it here) it is also important to know about the associativity of operators. Because of round-off errors `a + (b + c)` and `(a + b) + c` will in general not give the same result. So what is the meaning if you write `a + b + c` in your code skipping the parenthesis. Here the answer: In C the `+` operator is left associative, hence `(a + b) + c` and `a + b + c` are identical. Operator precedence and associativity ===================================== In the description for the __ULM assembly language__ you saw how the precedence and associativity of operators can be described by production rules. However, the C language is much richer in provided operators and production rules are therefore not always the description of choice. The table below is showing the relevant details that you can extract from the grammar. The semantics for (some of) the operators will be explained in subsequent sections on this page by examples. :links: ULM assembly language -> doc:session08/page02 +---------------+-------------------+---------------------------------------+---------------+ | Precedence | Associativity | Operators | Meaning | +---------------+-------------------+---------------------------------------+---------------+ | 16 (highest) | left | Variables | Literals and | | | | | Unary postfix | | | | `++` (increment) | | | | | | | | | | `--` (decrement) | | | | | | | | | | `f()` (function call) | | | | | | | | | | `a[i]` (index operator) | | | | | | | | | | `p->member` (indirect member access) | | | | | | | | | | `s.member` (direct member access) | | +---------------+-------------------+---------------------------------------+---------------+ | 15 | right | `*` (dereference operator) | Unary prefix | | | | | | | | | `&` (address operator) | | | | | | | | | | `-` (unary minus) | | | | | | | | | | `+` (unary plus) | | | | | | | | | | `!` (logical not) | | | | | | | | | | `~` (bitwise not) | | | | | | | | | | `++` (increment) | | | | | | | | | | `--` (decrement) | | | | | | | | | | `sizeof` | | +---------------+-------------------+---------------------------------------+---------------+ | 14 | right | `(`_type_`)` (type cast) | | +---------------+-------------------+---------------------------------------+---------------+ | 13 | left | `*` (multiply) |Multiplicative | | | | | | | | | `/` (divide) | | | | | | | | | | `%` (modulo) | | +---------------+-------------------+---------------------------------------+---------------+ | 12 | left | `+` (add) | Additive | | | | | | | | | `-` (subtract) | | +---------------+-------------------+---------------------------------------+---------------+ | 11 | left | `<<` (left shift) | Bit shift | | | | | | | | | `>>` (right shift) | | +---------------+-------------------+---------------------------------------+---------------+ | 10 | left | `<` (less) | Relation | | | | | | | | | `>` (greater) | | | | | | | | | | `<=` (less equal) | | | | | | | | | | `>=` (greater equal) | | +---------------+-------------------+---------------------------------------+---------------+ | 9 | left | `==` | Equality | | | | | | | | | `!=` | Inequality | +---------------+-------------------+---------------------------------------+---------------+ | 8 | left | `&` | Bitwise and | +---------------+-------------------+---------------------------------------+---------------+ | 7 | left | `^` | Bitwise | | | | | exclusive | | | | | or | +---------------+-------------------+---------------------------------------+---------------+ | 6 | left | `|` | Bitwise or | +---------------+-------------------+---------------------------------------+---------------+ | 5 | left | `&&` | Logical and | +---------------+-------------------+---------------------------------------+---------------+ | 4 | left | `||` | Logical or | +---------------+-------------------+---------------------------------------+---------------+ | 3 | right | `?` in conjunction with `:` | Conditional | +---------------+-------------------+---------------------------------------+---------------+ | 2 | right | `=`, | Assignment | | | | | | | | | `+=` | | | | | | | | | | `-=` | | | | | | | | | | `*=` | | | | | | | | | | `/=` | | | | | | | | | | `%=` | | | | | | | | | | `&=` | | | | | | | | | | `^=` | | | | | | | | | | `|=` | | | | | | | | | | `<<=` | | | | | | | | | | `>>=` | | +---------------+-------------------+---------------------------------------+---------------+ | 1 (lowest) | left | `,` | List | +---------------+-------------------+---------------------------------------+---------------+ Some simple examples for expression statements ============================================== Assume the we have some integer variable `a`, e.g. declared by ---- CODE (type=c) ------------------------------------------------------------- int a; -------------------------------------------------------------------------------- Because variables as well as literals are expressions ---- CODE (type=c) ------------------------------------------------------------- a; -------------------------------------------------------------------------------- and ---- CODE (type=c) ------------------------------------------------------------- 42; -------------------------------------------------------------------------------- are both legal expression statements. However, both statements have no effect. That means if the compiler ignores these statements you would not notice any difference. Simple expressions that are not assignment expressions but do have an effect are function and procedure calls. Assume `foo` is declared as procedure, e.g. ---- CODE (type=c) ------------------------------------------------------------- void foo(void); // no return value, no parameters -------------------------------------------------------------------------------- then ---- CODE (type=c) ------------------------------------------------------------- foo(); -------------------------------------------------------------------------------- is an expression statement which (possibly) does have an effect. So the compiler will generate code for calling `foo`. The fact that function calls are expressions means that return values of a function can be used to combine more complex expressions. Assume function `foo` does return an integer, e.g. let it be declared as ---- CODE (type=c) ------------------------------------------------------------- int foo(void); // return value of type 'int', no parameters -------------------------------------------------------------------------------- then ---- CODE (type=c) ------------------------------------------------------------- a = foo() + 42; -------------------------------------------------------------------------------- is an expression statement where the expression is an assignment expression where the right-hand side is an additive expression consisting of a function call and a literal. You see that verbal description soon becomes cumbersome but you see the syntax tree in you mind, right? More about assignment expressions --------------------------------- Besides the assignment operator `=` there other operators like `+=`, `*=` etc. They all have in common that the l-value gets update by a certain operation. For example ---- CODE (type=c) ------------------------------------------------------------- a += foo() + 42; -------------------------------------------------------------------------------- is equivalent to ---- CODE (type=c) ------------------------------------------------------------- a = a + foo() + 42; -------------------------------------------------------------------------------- Actually the ULM compiler is internally doing exactly this kind of transformation. For incrementing a variable by a constant this comes in handy. For example ---- CODE (type=c) ------------------------------------------------------------- a += 42; -------------------------------------------------------------------------------- would increment variable `a` by `42`. In conjunction with loops you often need to increment a variable by one, or decrement a variable by one. You certainly can do this with these assignment operators: ---- CODE (type=c) ------------------------------------------------------------- a += 1; // increment a by 1 a -= 1; // decrement a by 1 -------------------------------------------------------------------------------- However, for these cases the prefix operators `++` and `--` are even more convenient: ---- CODE (type=c) ------------------------------------------------------------- ++a; // increment a by 1 --a; // decrement a by 1 -------------------------------------------------------------------------------- Below you will find some examples for pointing out the difference between the prefix and postfix operators (and my arguments why I always prefer the prefix operator even if the postfix operator has the same effect). More complex assignments ------------------------ An assignment expression is an expression. So it can be used as an expression, for example as the r-value of another assignment expression: ---- CODE (type=c) ------------------------------------------------------------- int a, b; a = b = 42; -------------------------------------------------------------------------------- Because of the right associativity of the `=` operator the expression `a = b = 42` is equivalent to `a = (b = 42)` and not to `(a = b) = 42` which would result in a undefined value of `b`. Of course all this holds for all kind of assignment operators: ---- CODE (type=c) ------------------------------------------------------------- int a, b; a = b = 42; // assign 42 to b, and assign b to a a += b += 2; // increment b by 2, and increment a by b -------------------------------------------------------------------------------- Can you predict the values of `a` and `b` after each statement? Do you remember that programming can not be learned by merely reading? So check it out: ---- CODE (file=session08/hpc0_cprog_page3/src/xassign.c) ---------------------- #include int main() { int a, b; a = b = 42; printf("After 'a = b = %d;' we have a = %d, b = %d\n", 42, a, b); a += b += 2; printf("After 'a += b += %d;' we have a = %d, b = %d\n", 2, a, b); } -------------------------------------------------------------------------------- ---- SHELL (path=session08/hpc0_cprog_page3, fold) ----------------------------- make -------------------------------------------------------------------------------- ---- SHELL (path=session08/hpc0_cprog_page3) ----------------------------------- ./build_ucc/xassign ./build_gcc/xassign -------------------------------------------------------------------------------- Exercise -------- Think of other examples with similar expression statements and write a simple example for demonstrating the effects. For example using the operators `-` for subtraction, `*` for multiplication, `/` for division and `%` for modulo. Prefix and postfix operators `++` and `--` ========================================== I am sometimes asked why I am using the prefix operator in for-loops. For example, I would write ---- CODE (type-c) ----------------------------------------------------------- for (int i=0; i < n; ++i) { // <- could be my code /* ... */ } ------------------------------------------------------------------------------ but _never_ this ---- CODE (type-c) ----------------------------------------------------------- for (int i=0; i < n; i++) { // <- I certainly did not write that!! /* ... */ } ------------------------------------------------------------------------------ First of all, in this case it would not make any difference whether I wrote `++i` or `i++`. The compiler would the same code. But I have two rules for using the `++` and `--` operators. And I would violate my first rule: - If the postfix operator and the prefix operator both have the same effect then use the prefix operator. - If you don't know the difference between the postfix operator and the prefix operator then _never_ use the postfix operator. What the operators have in common --------------------------------- For expression statements that just contain increment expression there is no difference. For example `++a` and `a++` will be treated by the compiler as if you would have written `a = a + 1`: ---- CODE (file=session08/hpc0_cprog_page3/src/xincr.c) ------------------------ #include int main() { int a; a = 42; printf("After 'a = %d;' we have a = %d\n", 42, a); ++a; printf("After '++a;' we have a = %d\n", a); a++; printf("After 'a++;' we have a = %d\n", a); } -------------------------------------------------------------------------------- ---- SHELL (path=session08/hpc0_cprog_page3, fold) ----------------------------- make -------------------------------------------------------------------------------- ---- SHELL (path=session08/hpc0_cprog_page3) ----------------------------------- ./build_ucc/xincr ./build_gcc/xincr -------------------------------------------------------------------------------- Cases where it matters ---------------------- If `a++` is part of a more complex expression like `b = a++` then its original value is used for evaluating the expression and the increment happens afterwards. In other words the expression `b = a++` behaves like `b = a` with a subsequent side effect that `a` also gets incremented afterwards. If you see an `++a` expression you can substitute it with `(a = a + 1)` and explain the effect. So if you think `b = ++a` think of it as `b = (a = a + 1)`. In this form it is already explaining itself: the variable `a` gets incremented first (because `a = a + 1` is inside these parentheses) and then the incremented value is used for evaluating the remaining expression. ---- CODE (file=session08/hpc0_cprog_page3/src/xincr2.c) ----------------------- #include int main() { int a, b; a = 42; printf("After 'a = %d;' we have a = %d\n", 42, a); b = a++; printf("After 'b = a++;' we have a = %d, b = %d\n", a, b); b = ++a; printf("After 'b = ++a;' we have a = %d, b = %d\n", a, b); } -------------------------------------------------------------------------------- ---- SHELL (path=session08/hpc0_cprog_page3, fold) ----------------------------- make -------------------------------------------------------------------------------- ---- SHELL (path=session08/hpc0_cprog_page3) ----------------------------------- ./build_ucc/xincr2 ./build_gcc/xincr2 -------------------------------------------------------------------------------- Some fun with expression lists ============================== The comma-operator has the lowest precedence. So an expression like `a = 2, b = 4` is equivalent to `(a = 2), (b = 4)`. And this is probably its main application, i.e. initialize several variables with a single statement. Like in this for-loop: ---- CODE (file=session08/hpc0_cprog_page3/src/xexpr_list.c) ------------------- #include int main() { unsigned long i, factorial; for (i = 0, factorial = 1; i < 10; ++i) { printf("%2lu! = %10lu\n", i, factorial *= i > 0 ? i : 1); } } -------------------------------------------------------------------------------- So the expression list is here `i = 0, factorial = 1` used in the initial clause of the for-loop statement (and statements will be covered next). But certainly the conditional expression `factorial *= i > 0 ? i : 1` is here the actual star of the show. But let me first show you that this program simply prints the factorial from 0 to 10: ---- SHELL (path=session08/hpc0_cprog_page3, fold) ----------------------------- make -------------------------------------------------------------------------------- ---- SHELL (path=session08/hpc0_cprog_page3) ----------------------------------- ./build_ucc/xexpr_list ./build_gcc/xexpr_list -------------------------------------------------------------------------------- Because of the precedences the expression ---- CODE (type=c) ------------------------------------------------------------- factorial *= i > 0 ? i : 1 -------------------------------------------------------------------------------- is equivalent to ---- CODE (type=c) ------------------------------------------------------------- factorial *= (i > 0) ? i : 1 -------------------------------------------------------------------------------- And the assignment operator could be expanded to ---- CODE (type=c) ------------------------------------------------------------- factorial = factorial * ( (i > 0) ? i : 1 ) -------------------------------------------------------------------------------- Hence we essential need to understand the meaning of the conditional expression `(i > 0) ? i : 1`. More general expressed it has the form ---- CODE (type=c) ------------------------------------------------------------- cond ? valTrue : valFalse -------------------------------------------------------------------------------- and evaluates to `valTrue` if `cond` is not zero, and otherwise to `valFalse`.