=========================== Lexer: Recognizing Keywords [TOC] =========================== For supporting control structures the lexer has to recognize keywords. For now the necessary lexer extension will be realized quick and dirty: - Some new enum constants will be added. - After the lexer found an identifier it checks whether it is actually a keyword. For compound statements also the curly braces "_{_" and "_}_" need to be recognized as tokens. Of course, as an alternative keywords like _begin_ and _end_ could be used. New Token Kinds =============== If you use ``tokenkind.txt`` and generate from that code for the enum constants used by the lexer and function ``strTokenKind()`` simply add a corresponding entry like ``WHILE`` or ``TK_WHILE``. Otherwise add manually a new enum constant and patch your implementation of ``strTokenKind()``. Analogously add enum constants for the punctuators "_{_" and "_{_". For example, ``LBRACE`` and ``RBRACE`` or ``TK_LBRACE`` and ``TK_RBRACE`` respectively. Recognizing punctuators is nothing new and will not be discussed here any further ;-) Recognizing Keywords ==================== In ``lexer.c`` add a static function that checks if a string represents a keyword. The quick and dirty approach is to compare the string against all keywords. It then returns the proper token kind. Either for a matched keyword or an identifier. For example: ---- CODE (type=c) ------------------------------------------------------------- static enum TokenKind checkForKeyword(const char *s) { static bool first = true; const static struct UStr *kwWhile; if (first) { first = false; kwWhile = UStrAdd("while"); } const struct UStr *id = UStrAdd(s); if (id == kwWhile) { return WHILE; } else { return IDENTIFIER; } } -------------------------------------------------------------------------------- When an identifier was found in ``getToken()`` it now checks whether it is actually a keyword before it stes and returns the token kind. For example: ---- CODE (type=c) ------------------------------------------------------------- static enum TokenKind getToken(void) { // ... } else if (isLetter(ch)) { do { appendCharToStr(&token.val, ch); nextCh(); } while (isLetter(ch) || isDecDigit(ch)); return token.kind = checkForKeyword(token.val.cstr); } // ... } -------------------------------------------------------------------------------- Exercise ======== Extend the lexer such taht it also recognized the keywords _for_, _do_, _if_ and _else_. Here a simple test case: ---- CODE (file=session25/git/abc_step5/test_lexer_kw.in) ---------------------- while {} for do if else ffor doo ddo -------------------------------------------------------------------------------- ---- SHELL (path=session25/git/abc_step5,hide) --------------------------------- make -------------------------------------------------------------------------------- with keywords and identifiers ---- SHELL (path=session25/git/abc_step5) -------------------------------------- ./xtest_lexer < test_lexer_kw.in --------------------------------------------------------------------------------