========================== Pimped Auto-Generated Code [TOC] ========================== ---- SHELL (path=session26/git/abc_step1,hide) --------------------------------- make -------------------------------------------------------------------------------- Since __Session 21__ the source code for strTokenKind() and the enum constants for enum TokenKind are auto-generated. This approach can be extended. For __parsing left associative binary operators__ auxiliary functions ``tokenKindPrec()`` and ``makeBinaryExprKind()`` were implemented manually in the last session. These functions also can be auto-generated. But this is not the end of the story. Also code for recognizing tokens can be auto-generated. Tools like __Flex__ can generate a complete lexer. Our pimped generator will not go that far. But it will generate code to recognize punctuators (e.g. '``=``', '``==``', etc.) and keywords (e.g. '``for``', '``while``', etc.). :links: Session 21 -> doc:session21/page02 parsing left associative binary operators -> doc:session25/page02 Flex -> https://en.wikipedia.org/wiki/Flex_(lexical_analyser_generator) Updated ``Makefile``, ``tokenkind.txt`` and ``xgen_tokenkind.c`` ================================================================ Here the files that needs to be replaced: :import: session26/git/abc_step1/Makefile [fold] :import: session26/git/abc_step1/xgen_tokenkind.c [fold] :import: session26/git/abc_step1/tokenkind.txt [fold] In addition to ``gen_tokenkind.h`` and ``gen_strtokenkind.c`` also the files ``gen_parsepunctuator.c``, ``gen_parsekeyword.c``, ``gen_makebinaryexprkind.c`` and ``gen_tokenkindprec.c``, will be generated. Format of ``tokenkind.txt`` =========================== The description in file ``tokenkind.txt`` now can have empty lines. These lines will be ignored. The field format can be described by ---- CODE (type=txt) ----------------------------------------------------------- [ [ [ ]]] -------------------------------------------------------------------------------- where brackets indicate that its content is optional. Field ----------------- As before the first field of a line contains the identifier of a token. From this field the enum constants in ``gen_tokenkind.h`` :import: session26/git/abc_step1/gen_tokenkind.h [fold] and the implementation of ``strTokenKind()`` is generated :import: session26/git/abc_step1/gen_strtokenkind.c [fold] Field -------------------- From this field code for detecting punctuator and keyword tokens gets generated. If the first character is not a letter (i.e. 'a', ..., 'z', 'A', ..., 'Z' or '_') it is considered as a punctuator. For the lexer it generates the following code fragment for parsing these punctuators (this should look familiar): :import: session26/git/abc_step1/gen_parsepunctuator.c [fold] Otherwise the field describes a keyword. In the last session parsed identifiers were compared against a list of reserved strings to check if they are actually keywords. For handwritten code this is a reasonable approach. With generated code this can be done more efficient. For the lexer the following code gets generated to detect keywords: :import: session26/git/abc_step1/gen_parsekeyword.c [fold] Fields and ---------------------------- Either both fields have to be present or none. From these fields ``tokenKindPrec()`` in ``gen_tokenkindprec.c`` and ``makeBinaryExprKind()`` in ``gen_makebinaryexprkind.c`` get generated: :import: session26/git/abc_step1/gen_tokenkindprec.c [fold] :import: session26/git/abc_step1/gen_makebinaryexprkind.c [fold] Updating the Parser =================== In the parser the implementation of ``tokenKindPrec()`` and ``makeBinaryExprKind()`` now simply gets include: ---- CODE (type=c) ------------------------------------------------------------- /* * static int tokenKindPrec(enum TokenKind kind); * * Returns 0 if kind is not a left associative binary operator. * Otherwise returns a precedence > 0 */ #include "gen_tokenkindprec.c" /* * enum ExprKind makeBinaryExprKind(enum TokenKind kind); * * For left associative binary operators translates 'enum TokenKind' into * 'enum ExprKind' */ #include "gen_makebinaryexprkind.c" const struct Expr * parseLeftAssocBinaryExpr(int prec) { /* ... as before ... */ } -------------------------------------------------------------------------------- Updating the Lexer ================== Function ``getToken(void)`` has the following structure: ---- CODE (type=c) ------------------------------------------------------------- enum TokenKind getToken(void) { /* ... */ if (ch == EOF) { return token.kind = EOI; } else if (isDecDigit(ch)) { // parse literal /* ... */ // parsing punctuators } else if (ch == '&') { /* ... */ // parsing keywords and identifiers } else if (isLetter(ch)) { do { appendCharToStr(&token.val, ch); nextCh(); } while (isLetter(ch) || isDecDigit(ch)); return token.kind = checkForKeyword(token.val.cstr); } nextCh(); return token.kind = BAD_TOKEN; } -------------------------------------------------------------------------------- The part for parsing the punctuators can simply be included. For detecting keywords function ``checkForKeyword()`` is no longer needed. Instead first the code in ``gen_parsekeyword.c`` is used to detect keywords. If a keyword is found function ``getToken()`` returns. Hence, only if the code afterwards is reached the is an identifier in the input stream. With a while loop the remaining part of the identifier gets collected: ---- CODE (type=c) ------------------------------------------------------------- enum TokenKind getToken(void) { /* ... */ if (ch == EOF) { return token.kind = EOI; } else if (isDecDigit(ch)) { // parse unsigned integer literal /* ... */ // parsing punctuators #include "gen_parsepunctuator.c" // parsing keywords and identifiers } else if (isLetter(ch)) { // First detected keywords ... #include "gen_parsekeyword.c" // ... if there was no keyword detected it is an identifier while (isLetter(ch) || isDecDigit(ch)) { appendCharToStr(&token.val, ch); nextCh(); } return token.kind = IDENTIFIER; } nextCh(); return token.kind = BAD_TOKEN; } -------------------------------------------------------------------------------- Here the complete updated implementation of ``lexer.c``: :import: session26/git/abc_step1/lexer.c [fold]