C By Example Part 4: Concept of Variables in C

It is important to know that variables in C always have type, an address and a certain size. Furthermore, a variable is either a global variable (initialized or uninitialized) or a local variable. These properties and their importance for understanding C reflects that language was designed to program close to the hardware.

Lexical elements

Comments

Comments are lexical elements for the preprocessor and removed. Comments can be single-line or multi-line comments:

  • Single-line comments start with “#” or “//”

  • “/*” begins a multi-line comment and “*/” ends a multi-line comment

For example

1
2
3
4
5
// single line comment
/*
    multi-line
    comment
*/

Keywords

The ULM C compiler recognizes the following keywords which therefore can not be used as identifiers:

auto

break

case

char

const

continue

default

do

double

else

enum

extern

float

for

goto

if

inline

int

long

register

restrict

return

short

signed

sizeof

static

struct

switch

typedef

union

unsigned

void

volatile

while

_Bool

_Complex

_Imaginary

Identifiers

Identifiers begin with a letter (i.e. 'A' to 'Z' and 'a' to 'z'), or underscore '_' and are optionally continued with a sequence of more letters, underscores, or decimal digits 0 to 9.

Hence “foo”, “_fOo”, “_fOo1”, “_”, are allowed but not “2foo”, “.foo”.

Also note that identifiers beginning with an underscore should only be used within the C standard library.

Literals

  • Decimal, hexadecimal and octal constants (e.g. 12, 0x2a, 017)

  • Character constants (e.g. '\n', 'a');

  • String literals (e.g.“hello, world!”).

Punctuators

&

&=

&&

->

*

*=

^

^=

:

,

.

...

=

==

!

>=

>

>>

>>=

{

[

<=

(

<

<<

<<=

-

--

-=

!=

%

%=

+

++

+=

?

}

]

)

;

/

/=

~

~=

|

|=

||

Structure of a C Program

C programs are supposed to describe the text, data and BSS segment of the assembly code that the compiler generates. The text segment is used for function definitions, the data segment for initialized global variables and the BSS segment for uninitialized global variables (which are therefore zero initialized).

The C grammar states that a C program is a sequence of declarations, for instance type declarations, variable declarations and function definitions:

\[\begin{array}{rcl}\langle\text{translation-unit}\rangle & \to & \langle\text{top-level-declaration}\rangle \\ & \to & \langle\text{translation-unit}\rangle\; \langle\text{top-level-declaration}\rangle \\\langle\text{top-level-declaration}\rangle & \to & \langle\text{declaration}\rangle \\ & \to & \langle\text{function-definition}\rangle \\\langle\text{declaration}\rangle & \to & \langle\text{declaration-specifiers}\rangle\; \langle\text{initialized-declarator-list}\rangle\; \textbf{;} \\\langle\text{declaration-specifiers}\rangle & \to & \langle\text{storage-class--specifier}\rangle\; \langle\text{declaration-specifiers}\rangle_\text{opt} \\ & \to & \langle\text{type--specifier}\rangle\; \langle\text{declaration-specifiers}\rangle_\text{opt} \\ & \to & \langle\text{type-qualifier}\rangle\; \langle\text{declaration-specifiers}\rangle_\text{opt} \\ & \to & \langle\text{function-specifier}\rangle\; \langle\text{declaration-specifiers}\rangle_\text{opt} \\\end{array}\]

At this point it might be worth to point out what the terms declaration and definition have in common and in what respect they differ. First of all, any definition is also a declaration, or in other words, a definition is a special case of a declaration. The difference is that definitions have an impact on the (assembly) code generation whereas declarations just on the symbol tables within the compiler when it parse the source code.