==========================
Pimped Auto-Generated Code						[TOC]
==========================

---- SHELL (path=session26/git/abc_step1,hide) ---------------------------------
make
--------------------------------------------------------------------------------

Since __Session 21__ the source code for strTokenKind() and the enum constants
for enum TokenKind are auto-generated. This approach can be extended.  For
__parsing left associative binary operators__ auxiliary functions
``tokenKindPrec()`` and ``makeBinaryExprKind()`` were implemented manually in
the last session. These functions also can be auto-generated.

But this is not the end of the story. Also code for recognizing tokens can be
auto-generated. Tools like __Flex__ can generate a complete lexer. Our pimped
generator will not go that far. But it will generate code to recognize
punctuators (e.g. '``=``', '``==``', etc.) and keywords (e.g. '``for``',
'``while``', etc.). 

:links: Session 21 -> doc:session21/page02
	parsing left associative binary operators -> doc:session25/page02
	Flex -> https://en.wikipedia.org/wiki/Flex_(lexical_analyser_generator)


Updated ``Makefile``, ``tokenkind.txt`` and ``xgen_tokenkind.c``
================================================================
Here the files that needs to be replaced:

:import: session26/git/abc_step1/Makefile [fold]
:import: session26/git/abc_step1/xgen_tokenkind.c [fold]
:import: session26/git/abc_step1/tokenkind.txt [fold]

In addition to ``gen_tokenkind.h`` and ``gen_strtokenkind.c`` also the files
``gen_parsepunctuator.c``, ``gen_parsekeyword.c``, ``gen_makebinaryexprkind.c``
and ``gen_tokenkindprec.c``, will be generated.

Format of ``tokenkind.txt``
===========================
The description in file ``tokenkind.txt`` now can have empty lines. These lines
will be ignored. The field format can be described by

---- CODE (type=txt) -----------------------------------------------------------
[<tokenkind> [<tokenkindval> [<prec> <exprkind>]]]
--------------------------------------------------------------------------------

where brackets indicate that its content is optional.

Field <tokenkind>
-----------------
As before the first field of a line contains the identifier of a token. From
this field the enum constants in ``gen_tokenkind.h``

:import: session26/git/abc_step1/gen_tokenkind.h [fold]

and the implementation of ``strTokenKind()`` is generated

:import: session26/git/abc_step1/gen_strtokenkind.c [fold]

Field <tokenkindval>
--------------------
From this field code for detecting punctuator and keyword tokens gets
generated.

If the first character is not a letter (i.e. 'a', ..., 'z', 'A', ..., 'Z' or
'_') it is considered as a punctuator.  For the lexer it generates the
following code fragment for parsing these punctuators (this should look
familiar):

:import: session26/git/abc_step1/gen_parsepunctuator.c [fold]

Otherwise the field describes a keyword. In the last session parsed identifiers
were compared against a list of reserved strings to check if they are actually
keywords. For handwritten code this is a reasonable approach. With generated
code this can be done more efficient. For the lexer the following code gets
generated to detect keywords:

:import: session26/git/abc_step1/gen_parsekeyword.c [fold]

Fields <prec> and <exprkind>
----------------------------
Either both fields have to be present or none. From these fields
``tokenKindPrec()`` in ``gen_tokenkindprec.c`` and ``makeBinaryExprKind()`` in
``gen_makebinaryexprkind.c`` get generated:

:import: session26/git/abc_step1/gen_tokenkindprec.c [fold]
:import: session26/git/abc_step1/gen_makebinaryexprkind.c [fold]

Updating the Parser
===================
In the parser the implementation of ``tokenKindPrec()`` and
``makeBinaryExprKind()`` now simply gets include:

---- CODE (type=c) -------------------------------------------------------------
/*
 *  static int tokenKindPrec(enum TokenKind kind);
 *  
 *  Returns 0 if kind is not a left associative binary operator.
 *  Otherwise returns a precedence > 0
 */
#include "gen_tokenkindprec.c"

/*
 *  enum ExprKind makeBinaryExprKind(enum TokenKind kind);
 *
 *  For left associative binary operators translates 'enum TokenKind' into
 *  'enum ExprKind'
 */
#include "gen_makebinaryexprkind.c"

const struct Expr *
parseLeftAssocBinaryExpr(int prec)
{
    /* ... as before ... */
}
--------------------------------------------------------------------------------

Updating the Lexer
==================
Function ``getToken(void)`` has the following structure:

---- CODE (type=c) -------------------------------------------------------------
enum TokenKind
getToken(void)
{
    /* ... */

    if (ch == EOF) {
	return token.kind = EOI;
    } else if (isDecDigit(ch)) {
	// parse literal
	/* ... */
    // parsing punctuators
    } else if (ch == '&') {
	/* ... */
    // parsing keywords and identifiers
    } else if (isLetter(ch)) {
	do {
	    appendCharToStr(&token.val, ch);
	    nextCh();
	} while (isLetter(ch) || isDecDigit(ch));
	return token.kind = checkForKeyword(token.val.cstr);
    }

    nextCh();
    return token.kind = BAD_TOKEN;
}
--------------------------------------------------------------------------------

The part for parsing the punctuators can simply be included. For detecting
keywords function ``checkForKeyword()`` is no longer needed. Instead first
the code in ``gen_parsekeyword.c`` is used to detect keywords. If a keyword
is found function ``getToken()`` returns. Hence, only if the code afterwards
is reached the is an identifier in the input stream. With a while loop the
remaining part of the identifier gets collected:

---- CODE (type=c) -------------------------------------------------------------
enum TokenKind
getToken(void)
{
    /* ... */

    if (ch == EOF) {
	return token.kind = EOI;
    } else if (isDecDigit(ch)) {
	// parse unsigned integer literal
	/* ... */
    // parsing punctuators
	#include "gen_parsepunctuator.c"
    // parsing keywords and identifiers
    } else if (isLetter(ch)) {
	// First detected keywords ...
	#include "gen_parsekeyword.c"
	// ... if there was no keyword detected it is an identifier
	while (isLetter(ch) || isDecDigit(ch)) {
	    appendCharToStr(&token.val, ch);
	    nextCh();
	}
	return token.kind = IDENTIFIER;
    }

    nextCh();
    return token.kind = BAD_TOKEN;
}
--------------------------------------------------------------------------------

Here the complete updated implementation of ``lexer.c``:

:import: session26/git/abc_step1/lexer.c [fold]