====================== Another Grammar Update [TOC] ====================== There actually will be more than just one update. The first update will allow empty expression statements like in C. Currently this program ---- CODE (file=session25/git/abc_step1/empty_expr_statement.abc) -------------- ; -------------------------------------------------------------------------------- produces a syntax error: ---- SHELL (path=session25/git/abc_step1,hide) --------------------------------- make -------------------------------------------------------------------------------- ---- SHELL (path=session25/git/abc_step1) -------------------------------------- ./xtest_abc < empty_expr_statement.abc -------------------------------------------------------------------------------- In the grammer we just have to change the production for an expression statement to ---- LATEX --------------------------------------------------------------------- \begin{array}{rcl} \text{expr-statement} & = & [ \text{assignment-expr} ]\; \texttt{";"}\; \\ \end{array} -------------------------------------------------------------------------------- Hence, we have to update again the parser. Updating the Parse Functions: The Obvious Approach ================================================== In ``parseExprStatement`` we could first check if the current token is the semicolon and in this case return after consuming the token: ---- CODE (type=c) ------------------------------------------------------------- static void parseExprStatement(void) { if (token.kind == SEMICOLON) { getToken(); return; } /* rest as before */ } -------------------------------------------------------------------------------- This will solve the problem ---- SHELL (path=session25/git/abc_step2,hide) --------------------------------- make echo ";" > empty_expr_statement.abc -------------------------------------------------------------------------------- ---- SHELL (path=session25/git/abc_step2) -------------------------------------- ./xtest_abc < empty_expr_statement.abc -------------------------------------------------------------------------------- However, it will later turn out to be more convenient to approach things in a way that now seems to be unnecessarily complicated. To be clear: do not apply the above change to ``parseExprStatement()``. Updating the Parse Functions: What Helps Us Later ================================================= It will be later more convenient if ``parseAssignmentExpr()`` returns a null pointer if no expression was found. For now it means that we have to struggle through a few changes. In short if a new expression node gets created we have to check first if all pointers to child node are not null pointers. Changes in ``parsePrimaryExpr()`` --------------------------------- Currently an assertion failure gets triggered if no primary expression could be parsed. Remove the assertion and simply return a null pointer in this case. Changes in ``parseUnaryExpr()`` ------------------------------- If an unary operator was found it has to be followed by a non-empty expression. Otherwise it is a syntax error: ---- CODE (type=c) ------------------------------------------------------------- static const struct Expr * parseUnaryExpr(void) { if (token.kind == PLUS || token.kind == MINUS) { enum TokenKind op = token.kind; getToken(); const struct Expr *expr = parseUnaryExpr(); if (!expr) { expectedError("non-empty expression"); } if (op == MINUS) { return newUnaryExpr(EK_UNARY_MINUS, expr); } return newUnaryExpr(EK_UNARY_PLUS, expr); } return parsePrimaryExpr(); } -------------------------------------------------------------------------------- Changed in ``parseLeftAssocBinaryExpr()`` ----------------------------------------- If a binary operator was found it has to be followed by a non-empty expression. Add a check similarly to the one in ``parseUnaryExpr()``. Changes in ``parseAssignmentExpr()`` ------------------------------------ If an assignment operator was found it has to be followed by a non-empty expression. Add a check similarly to the one in ``parseUnaryExpr()`` and ``parseLeftAssocBinaryExpr()``. Changes in ``parseExprStatement(void)`` --------------------------------------- If the expression is empty a semicolon is expected. In this case it gets consumed and the parse function returns: ---- CODE (type=c) ------------------------------------------------------------- static void parseExprStatement(void) { const struct Expr *expr = parseAssignmentExpr(); if (expr) { const struct Expr *folded = constFoldExpr(expr); GenReg dest = genGetReg(); /* ... as before ... */ } expected(SEMICOLON); getToken(); } -------------------------------------------------------------------------------- Changes in ``parse()`` ---------------------- We are currently parsing here our I/O hack. Both operators "_$>_" and "_$<_" have to be followed by a non-empty assignment expression. Below it will be suggested to outsource parsing of the I/O hack. But before another change gets applied the current modifications should be tested. Tests for the Parser Update =========================== In the following cases the compiler should in some cases generate syntax errors but never an assertion failure: ---- SHELL (path=session25/git/abc_step3,hide) --------------------------------- make -------------------------------------------------------------------------------- ---- SHELL (path=session25/git/abc_step3) -------------------------------------- # this should give a syntax error echo "$>;" | ./xtest_abc foo.s || echo "ok! Error was expected" # this should give a syntax error echo "$<;" | ./xtest_abc foo.s || echo "ok! Error was expected" # this should give a syntax error echo "3 +;" | ./xtest_abc foo.s || echo "ok! Error was expected" # also this echo "+;" | ./xtest_abc foo.s || echo "ok! Error was expected" # this should be ok echo ";" | ./xtest_abc foo.s || echo "Ups! This should compile" -------------------------------------------------------------------------------- Some Cleanup: Grammar and Parse Functions for Statements ======================================================== So far our grammar was focused on expressions. We only had two kind of statements: expression statements and the I/O hack. This will change soon when we add statements for control structures. Let's prepare the grammar already for that: ---- LATEX --------------------------------------------------------------------- \begin{array}{rcl} \text{input-sequence} & = & \{\; \text{statement}\; \}\; \\ \text{statement} & = & \text{io-hack-statement}\; \\ & | & \text{expr-statement}\; \\ \text{io-hack-statement} & = & (\; $> \; | \; $< \; ) \; \text{assignment-expr}\; \texttt{;}\; \\ \text{expr-statement} & = & \text{assignment-expr}\; \texttt{;}\; \\ \end{array} -------------------------------------------------------------------------------- It might actually be a good idea to have two source files for parse functions: ``parse_expr.c`` for parsing expressions and _parse_stmnt.c`` for parsing statements. However, for now we keep everything in ``parse.c``. But conceptually this separation will be reflected in how the code gets reorganized. Furthermore, parse functions for statements will have a boolean as return type. If a parse function for a statement returns _false_ it means two things: - no token was consumed and - the current token does not initiate a corresponding statement. If the parse function returns _true_ the corresponding statement could be parsed and all its tokens are consumed when the function returns. This is possible because from our grammar the kind of statement can be inferred by the first token of the statement. For example, a _while-statement_ will begin with a _while_ token, a _for-statements_ with a _for_ token, etc. Memory Management ----------------- So far expressions only could occur within an expression statement (or an IO hack). This will no longer be the case. Every statement can contain an expression. Memory for expressions can be released after a statement was parsed and code for it generated. Function ``parse()`` and ``parseStatement()`` --------------------------------------------- Here all the forward declarations of parse functions and the "main" parse function _parse()_. It parses a sequence of statements until the end of input is reached. Memory for expressions can be released after a statement was parsed: ---- CODE (type=c) ------------------------------------------------------------- // for parsing statements static bool parseStatement(void); static bool parseIOHackStatement(void); static bool parseExprStatement(void); // for parsing expressions static const struct Expr *parseAssignmentExpr(void); static const struct Expr *parseLeftAssocBinaryExpr(int prec); static const struct Expr *parseUnaryExpr(void); static const struct Expr *parsePrimaryExpr(void); void parse(void) { while (token.kind != EOI) { if (!parseStatement()) { expectedError("statement"); } deleteAllExpr(); } } -------------------------------------------------------------------------------- Function ``parseStatement()`` tries to parse any kind of statements and returns _true_ if this succeeds and otherwise _false_. Note that it will not matter in which order it tries to parse the different kinds of statements. ---- CODE (type=c) ------------------------------------------------------------- static bool parseStatement(void) { if (parseExprStatement()) { return true; } else if (parseIOHackStatement()) { return true; } else { return false; } } -------------------------------------------------------------------------------- Function ``parseExprStatement()`` --------------------------------- If no expression could be parsed and the current token is not a semicolon the function returns _false_. The rest of the implementation is unchanged: ---- CODE (type=c) ------------------------------------------------------------- static bool parseExprStatement(void) { const struct Expr *expr = parseAssignmentExpr(); if (expr) { /* ... almost as before: no longer call deleteAllExpr() here ... */ } else if (token.kind != SEMICOLON) { return false; } expected(SEMICOLON); getToken(); return true; } -------------------------------------------------------------------------------- Function ``parseIOHackStatement()`` ----------------------------------- If the current token is not a dollar sign it immediately returns _false_. Otherwise it parses the IO hack: ---- CODE (type=c) ------------------------------------------------------------- static bool parseIOHackStatement(void) { if (token.kind != DOLLAR) { return false; } getToken(); if (token.kind == GREATER) { getToken(); // read unsigned integer /* ... as before ... */ } else if (token.kind == LESS) { getToken(); // print unsigned integer /* ... as before ... */ } else { expectedError("'>' or '<'"); } expected(SEMICOLON); getToken(); return true; } -------------------------------------------------------------------------------- Exercise ======== Rearrange the parse functions as outlined above. Test your implementation for example with ---- CODE (file=session25/git/abc_step4/test2.abc) ------------------------------ a + b * (c + d); x + 1 == y; x + 1 != y * z; x + 1 <= y * z; x + 1 >= y * z; x + 1 < y * z; x + 1 > y * z; ; -------------------------------------------------------------------------------- After generating Latex code for the expression tree representations with ---- SHELL (path=session25/git/abc_step4/) ------------------------------------- ./xtest_abc test2.s test2.tex < test2.abc lualatex test2.tex > /dev/null -------------------------------------------------------------------------------- ---- SHELL (path=session25/git/abc_step4/, hide) ------------------------------- cp test2.pdf /home/www/htdocs/numerik/hpc/ss22/hpc0/session25/ -------------------------------------------------------------------------------- you should get this __pdf__. Note that the empty expression statement will not show up in the document. :links: pdf -> https://www.mathematik.uni-ulm.de/numerik/hpc/ss22/hpc0/session25/test2.pdf