======================
Another Grammar Update
There actually will be more than just one update. The first update will allow empty expression statements like in C. Currently this program
produces a syntax error:
theon$ ./xtest_abc < empty_expr_statement.abc 1.1: error expected 'primary expressin' got 'SEMICOLON' warning: register 6 not released ldzwq 0x0, %2 parseBinaryExpr(1) theon$
In the grammer we just have to change the production for an expression statement to
\[\begin{array}{rcl}\text{expr-statement} & = & [ \text{assignment-expr} ]\; \texttt{";"}\; \\\end{array}\]Hence, we have to update again the parser.
Updating the Parse Functions: The Obvious Approach
In parseExprStatement we could first check if the current token is the semicolon and in this case return after consuming the token:
1 2 3 4 5 6 7 8 9 | static void
parseExprStatement(void)
{
if (token.kind == SEMICOLON) {
getToken();
return;
}
/* rest as before */
}
|
This will solve the problem
theon$ ./xtest_abc < empty_expr_statement.abc ldzwq 0x0, %2 halt %0 .data .bss theon$
However, it will later turn out to be more convenient to approach things in a way that now seems to be unnecessarily complicated. To be clear: do not apply the above change to parseExprStatement().
Updating the Parse Functions: What Helps Us Later
It will be later more convenient if parseAssignmentExpr() returns a null pointer if no expression was found. For now it means that we have to struggle through a few changes. In short if a new expression node gets created we have to check first if all pointers to child node are not null pointers.
Changes in parsePrimaryExpr()
Currently an assertion failure gets triggered if no primary expression could be parsed. Remove the assertion and simply return a null pointer in this case.
Changes in parseUnaryExpr()
If an unary operator was found it has to be followed by a non-empty expression. Otherwise it is a syntax error:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | static const struct Expr *
parseUnaryExpr(void)
{
if (token.kind == PLUS || token.kind == MINUS) {
enum TokenKind op = token.kind;
getToken();
const struct Expr *expr = parseUnaryExpr();
if (!expr) {
expectedError("non-empty expression");
}
if (op == MINUS) {
return newUnaryExpr(EK_UNARY_MINUS, expr);
}
return newUnaryExpr(EK_UNARY_PLUS, expr);
}
return parsePrimaryExpr();
}
|
Changed in parseLeftAssocBinaryExpr()
If a binary operator was found it has to be followed by a non-empty expression. Add a check similarly to the one in parseUnaryExpr().
Changes in parseAssignmentExpr()
If an assignment operator was found it has to be followed by a non-empty expression. Add a check similarly to the one in parseUnaryExpr() and parseLeftAssocBinaryExpr().
Changes in parseExprStatement(void)
If the expression is empty a semicolon is expected. In this case it gets consumed and the parse function returns:
1 2 3 4 5 6 7 8 9 10 11 12 | static void
parseExprStatement(void)
{
const struct Expr *expr = parseAssignmentExpr();
if (expr) {
const struct Expr *folded = constFoldExpr(expr);
GenReg dest = genGetReg();
/* ... as before ... */
}
expected(SEMICOLON);
getToken();
}
|
Changes in parse()
We are currently parsing here our I/O hack. Both operators “$>” and “$<” have to be followed by a non-empty assignment expression. Below it will be suggested to outsource parsing of the I/O hack. But before another change gets applied the current modifications should be tested.
Tests for the Parser Update
In the following cases the compiler should in some cases generate syntax errors but never an assertion failure:
theon$ # this should give a syntax error theon$ echo "$>;" | ./xtest_abc foo.s || echo "ok! Error was expected" 1.3: error expected 'non-empty expression' got 'SEMICOLON' warning: register 6 not released warning: register 7 not released theon$ # this should give a syntax error theon$ echo "$<;" | ./xtest_abc foo.s || echo "ok! Error was expected" 1.3: error expected 'non-empty expression' got 'SEMICOLON' warning: register 6 not released ok! Error was expected theon$ # this should give a syntax error theon$ echo "3 +;" | ./xtest_abc foo.s || echo "ok! Error was expected" 1.4: error expected 'non-empty expression' got 'SEMICOLON' ok! Error was expected theon$ # also this theon$ echo "+;" | ./xtest_abc foo.s || echo "ok! Error was expected" 1.2: error expected 'non-empty expression' got 'SEMICOLON' ok! Error was expected theon$ # this should be ok theon$ echo ";" | ./xtest_abc foo.s || echo "Ups! This should compile" theon$
Some Cleanup: Grammar and Parse Functions for Statements
So far our grammar was focused on expressions. We only had two kind of statements: expression statements and the I/O hack. This will change soon when we add statements for control structures. Let's prepare the grammar already for that:
\[\begin{array}{rcl}\text{input-sequence} & = & \{\; \text{statement}\; \}\; \\\text{statement} & = & \text{io-hack-statement}\; \\ & | & \text{expr-statement}\; \\\text{io-hack-statement} & = & (\; $> \; | \; $< \; ) \; \text{assignment-expr}\; \texttt{;}\; \\\text{expr-statement} & = & \text{assignment-expr}\; \texttt{;}\; \\\end{array}\]It might actually be a good idea to have two source files for parse functions: parse_expr.c for parsing expressions and parsestmnt.c for parsing statements. However, for now we keep everything in parse.c``. But conceptually this separation will be reflected in how the code gets reorganized.
Furthermore, parse functions for statements will have a boolean as return type. If a parse function for a statement returns false it means two things:
-
no token was consumed and
-
the current token does not initiate a corresponding statement.
If the parse function returns true the corresponding statement could be parsed and all its tokens are consumed when the function returns.
This is possible because from our grammar the kind of statement can be inferred by the first token of the statement. For example, a while-statement will begin with a while token, a for-statements with a for token, etc.
Memory Management
So far expressions only could occur within an expression statement (or an IO hack). This will no longer be the case. Every statement can contain an expression. Memory for expressions can be released after a statement was parsed and code for it generated.
Function parse() and parseStatement()
Here all the forward declarations of parse functions and the “main” parse function parse(). It parses a sequence of statements until the end of input is reached. Memory for expressions can be released after a statement was parsed:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | // for parsing statements
static bool parseStatement(void);
static bool parseIOHackStatement(void);
static bool parseExprStatement(void);
// for parsing expressions
static const struct Expr *parseAssignmentExpr(void);
static const struct Expr *parseLeftAssocBinaryExpr(int prec);
static const struct Expr *parseUnaryExpr(void);
static const struct Expr *parsePrimaryExpr(void);
void
parse(void)
{
while (token.kind != EOI) {
if (!parseStatement()) {
expectedError("statement");
}
deleteAllExpr();
}
}
|
Function parseStatement() tries to parse any kind of statements and returns true if this succeeds and otherwise false. Note that it will not matter in which order it tries to parse the different kinds of statements.
1 2 3 4 5 6 7 8 9 10 11 | static bool
parseStatement(void)
{
if (parseExprStatement()) {
return true;
} else if (parseIOHackStatement()) {
return true;
} else {
return false;
}
}
|
Function parseExprStatement()
If no expression could be parsed and the current token is not a semicolon the function returns false. The rest of the implementation is unchanged:
1 2 3 4 5 6 7 8 9 10 11 12 13 | static bool
parseExprStatement(void)
{
const struct Expr *expr = parseAssignmentExpr();
if (expr) {
/* ... almost as before: no longer call deleteAllExpr() here ... */
} else if (token.kind != SEMICOLON) {
return false;
}
expected(SEMICOLON);
getToken();
return true;
}
|
Function parseIOHackStatement()
If the current token is not a dollar sign it immediately returns false. Otherwise it parses the IO hack:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | static bool
parseIOHackStatement(void)
{
if (token.kind != DOLLAR) {
return false;
}
getToken();
if (token.kind == GREATER) {
getToken();
// read unsigned integer
/* ... as before ... */
} else if (token.kind == LESS) {
getToken();
// print unsigned integer
/* ... as before ... */
} else {
expectedError("'>' or '<'");
}
expected(SEMICOLON);
getToken();
return true;
}
|
Exercise
Rearrange the parse functions as outlined above. Test your implementation for example with
1 2 3 4 5 6 7 8 | a + b * (c + d);
x + 1 == y;
x + 1 != y * z;
x + 1 <= y * z;
x + 1 >= y * z;
x + 1 < y * z;
x + 1 > y * z;
;
|
After generating Latex code for the expression tree representations with
theon$ ./xtest_abc test2.s test2.tex < test2.abc theon$ lualatex test2.tex > /dev/null theon$
you should get this pdf. Note that the empty expression statement will not show up in the document.