Storage Classes extern and static

Exercise: Lexer for the abc Compiler

We have to apply some improvements to our compiler project that we started in Session 13, Page 4:

  • The lexer should also give as information about the line number and column in which a token was found. It should also be possible to retrieve the current token directly. And yes, we use here global variables declared in a header (in the end only lexer.h will declare a global variable, and just one).

    The header was therefore modified as follows (we clearly need struct and enum soon):

    #ifndef ABC_LEXER_H
    #define ABC_LEXER_H
    
    #include <stddef.h>
    
    /*
       Returns token kind:
    
        0 = EOI (end of input)
        1 = BAD_TOKEN
        2 = HEX_LITERAL
        3 = OCT_LITERAL
        4 = DEC_LITERAL
        5 = PLUS ('+')
        6 = MINUS ('-')
        7 = ASTERISK ('*')
        8 = SLASH ('/')
        9 = PERCENT ('%')
       10 = EQUAL ('=')
       11 = LPAREN (left paranthesis '(')
       12 = RPAREN (right paranthesis ')')
       13 = SEMICOLON
       14 = IDENTIFIER
    */
    
    int getToken(void);
    
    
    // direct access to current token
    
    extern int token_kind;
    extern size_t token_line;
    extern size_t token_col;
    
    #endif // ABC_LEXER_H
    
  • This is the modified test program. Just a line of code was added to also print the token position:

    #include <stdio.h>
    
    #include "lexer.h"
    
    int
    main(void)
    {
        int token;
        while ((token = getToken()) != 0) {
            printf("%zu.%zu: ", token_line, token_col);
            if (token == 1) {
                printf("BAD_TOKEN\n");
            } else if (token == 2) {
                printf("HEX_LITERAL\n");
            } else if (token == 3) {
                printf("OCT_LITERAL\n");
            } else if (token == 4) {
                printf("DEC_LITERAL\n");
            } else if (token == 5) {
                printf("PLUS\n");
            }
        }
    }
    
  • In lexer.c all functions and global variables that are not declared in the header should be declared as static.

Of course the real deal here is: Change the implementation in lexer.c so that after calling getToken() the values of the global variables declared in lexer.h are correct.

Some Example

You can use a file like this to test you lexer:

1
2
3
4
a = 5;
b = 42;
c = (a + b) *2;
123 0123 0xaB12 abc +-/*%^()

Then with xtest_test you should get:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
theon$ ./xtest_lexer < test_lexer.in
1.1: IDENTIFIER
1.3: EQUAL
1.5: DEC_LITERAL
1.6: SEMICOLON
2.1: IDENTIFIER
2.3: EQUAL
2.5: DEC_LITERAL
2.7: SEMICOLON
3.1: IDENTIFIER
3.3: EQUAL
3.5: LPAREN
3.6: IDENTIFIER
3.8: PLUS
3.10: IDENTIFIER
3.11: RPAREN
3.13: ASTERISK
3.14: DEC_LITERAL
3.15: SEMICOLON
4.1: DEC_LITERAL
4.5: OCT_LITERAL
4.10: HEX_LITERAL
4.17: IDENTIFIER
4.21: PLUS
4.22: MINUS
4.23: SLASH
4.24: ASTERISK
4.25: PERCENT
4.26: BAD_TOKEN
4.27: LPAREN
4.28: RPAREN

Once we double checked that the lexer is handling lines and columns correctly we can save the output in a file:

1
theon$ ./xtest_lexer < test_lexer.in > test_lexer.ref.out

Later we can use in the makefile a target check so that make check simply comares in furture the result with this trusted output. Basically this target will do in this case a diff:

1
2
theon$ ./xtest_lexer < test_lexer.in > test_lexer.out
theon$ diff test_lexer.out test_lexer.ref.out