Modula-2 || Compiler & Tools || Library || Search Engine


Ulm's Modula-2 System:
myacc


NAME

myacc - LALR(1) parser generator for Modula-2

SYNOPSIS

myacc [-vrelkKtbld] [-omodname] [-Wdir] grammar

DESCRIPTION

Myacc is an LALR(1) parser generator that has been developed from yacc(1). From a grammar containing definitions, rules, actions and additional Modula-2-text myacc constructs a complete parser written in Modula-2.

Basically myacc has the same capabilities than yacc, even if some changes in the declaration part were necessary to reflect the specific demands of Modula-2. In comparison to yacc some additional features have been added to improve error handling of the created parser.

Myacc supports following options:

-v
generate a verbose file that contains a detailed description of the parser, the specification of all conflicts and some informal statistics.
-t
cause the constructed parser to trace syntax analysis
-r
rename the suffix of the generated definition module from .d to .dy. May be used to prevent make(1) from recompilation of dependent modules in those cases the contents of the parser definition module has not changed.
-k
keep (do not delete) output files. By default all output files are removed if things go wrong.
-K
keep (do not delete) any file. This option applies also to all temporary files generated by myacc.
-b
invokes m2b as output filter. May be helpful for debugging and a beforehand syntax check of actions.
-e
enable parser to issue detailed error messages (see below)
-l
cause the parser to read its tables from a data file instead of assigning the values element wise during initialization. This option seriously reduces the compilation and loading time of the parser and should be used for every non trivial application.
-o[]modname
make modname the name of the output modules. All filenames are built by concatenating modname and a suitable suffix (see below for details). Since this option takes effect on all files created more than one myacc may be active on a directory. Default modname is 'Myacc'.
-W[]dir
specify the directory dir where the parser will find its data file. Requested only if the parser and data file are installed on different directories.
grammar
input file

GRAMMAR

Identifiers and single character constants must match the Modula-2 format. Comments reach from (* to *) and may be nested. Keywords and character-sequences of special meaning are always headed by '%' and do not contain any space characters.

Input files to myacc are separated by %%-marks into a (optional) declaration section, the grammar rules and some additional Modula-2-text (in the following always referred as ANY).

grammar
: [declarations] %% rules [%% ANY]

DECLARATIONS

%imports ANY
the following text up to the next myacc keyword is copied into implementation and definition module of the parser. Intended to enable import of objects that are used within both modules (f.i. types used for value stack definition). Should be used only once in a grammar and must precede any other declaration expect %init (see below).
%defs ANY
the following text up to the next myacc keyword is copied into definition module. Can be used to export objects declared within the parser.
%{ ANY %}
the text between the two marks is copied into implementation module. Intended for definition of constants, types, variables and procedure. The first %{ ANY %} in a grammar file may also contain import statements.

%case variant { ; variant } %end

defines the type of value stack elements as a case variant record. Each variant must be specified
identifier : simpletype
where simpletype is a Modula-2 identifier designating the type of a variant.

The following declarations have exactly the same semantic as the corresponding declarations of yacc, but follow a slightly modified syntax. Myacc will not recognize other keywords than those listed. Please note that typename always refers to the left hand side of a variant in the %case..%end-construction. Single character constants (char) may be specified octal or as a string of length one.

%token [ <typename> ] (identifier|char) {identifier|char}

Definition of terminal symbols and their value type. Character constants (which need not not to be declared) may appear to determine their type.

%left [ <typename> ] (identifier|char) {identifier|char}
%right [ <typename> ] (identifier|char) {identifier|char}
%non [ <typename> ] (identifier|char) {identifier|char}

Define precedence table for terminal symbols with the indicated associativity. These definitions do not require a previous %token-definition.

%type <typename> (identifier|char) {identifier|char}

Value type definition for the listed grammar symbols.
%start identifier
Outstand the starting symbol of grammar, which is by default the left hand side of the first rule.

The following both declarations are intended to support keyword recognition.

%keyword [ <typename> ] identifier {identifier}

Same effect for language recognition that a %token-definition. The listed tokens are marked as keywords.
%init [module.] procedure
Defines module.procedure to be called for any keyword during initialization phase of the parser. The expected procedure head is (Keywords.DefineKey could be used):
PROCEDURE procedure (keywordtext : ARRAY OF CHAR; tokenvalue : INTEGER);
%init must not be preceded by nothing else that a %imports-definition.

RULES

Besides some lexical aspects already mentioned there are no differences between yacc and myacc concerning syntax and semantic of rules and actions:
error
is a reserved token name for error handling.
%prec
is available to decide ambiguities by precedence of operators (tokens).
{ ANY }
represents an action

$[< typename> ][-] number

within an action refers to the value of the corresponding grammar symbol, $$ to the non terminal symbol on left hand side of the rule.
;
is the optional end of rule mark

PROGRAM

Myacc is unable to check syntax or semantic of Modula-2 program text. The following remarks should be noticed to avoid problems when compiling the parser:

Internally used names are introduced by yy.

All objects declared by the user are global to the module. Statements must be part of a procedure. Initialization of global variables can be realized as an action of the first (empty) rule.

Actions must contain nothing else but legal statement sequences.

The output files contain line number information for debugging of compilation errors.

PARSER

A definition module created by myacc will (more or less) look like this:
TYPE 
   YYSTYPE = 
      RECORD 
         CASE : CARDINAL OF 
         | 1: var1: type1;
         | 2: var2: type2;
         END 
      END;

(* Token definitions *) CONST token1 = 257; token2 = 258; (* ... *)

VAR yylval: YYSTYPE;

PROCEDURE yyparse() : INTEGER; PROCEDURE yytoktext(tok : INTEGER; VAR text : ARRAY OF CHAR);

CONST YYERRCODE = 256;

TYPE YYTOKSET = SET OF [0..511];

VAR yylex : PROCEDURE () : INTEGER; yyerror: PROCEDURE ( (* errorno: *) INTEGER, (* no. of error *) (* line: *) INTEGER, (* -1 if unknown *) (* col: *) INTEGER, (* -1 if unknown *) (* errortoken: *) INTEGER, (*illegal input sym.*) (* errortext: *) ARRAY OF CHAR, (* "" if unknown *) (* expected: *) YYTOKSET); (* empty without *) (* -e option *) yytext : POINTER TO CHAR; yyline : POINTER TO CARDINAL; yycol : POINTER TO CARDINAL;

YYSTYPE defines the type of value stack. The case variants result from the %case-definition.

The input symbols to be recognized by lexical analysis are defined as constants. Their names are equal to those used in the token definitions of the input grammar. Occasionally Myacc adds the suffix sy to avoid conflicts with predefined Modula-2 names.

yyparse executes syntax analysis, repeatly calling yylex to obtain the next symbol from input stream. A suitable procedure has to be assigned to yylex outside the parser module. yyparse expects this procedure to return:

yylval is the value associated with the current input symbol and must be set by the lexical analyzer.

yyparse returns 0 on successful completion of syntax analysis, -1 in case of an unrecovered syntax error and a value < -1 if parsing tables could not be loaded.

yytoktext yields a printable text for any token tok.

Any syntax error will cause yyparse to call yyerror no matter if the error is recovered or not. Myacc assigns a default error handling routine to yyerror, that will receive the indicated arguments from the parser. By default yyparse provides information about the illegal symbol ('errortoken') and the consecutive number ('errorno') of the error currently treated. If myacc was called with option -e a set of legal input symbols ('expected') will be computed as well. Information about position ('line', 'col') and text ('errortext') of erroneous input symbols are available only if yyparse can dereference yyline, yycol and yytext. (i.e. if these pointers have been assigned to the address of variables that hold these information).

ERROR

The default error handling reports syntax errors to StdIO.stderr. The messages will contain all information currently available to yyparse (see above):

[syntax error in line 1] near identifier 'a' (column 1). Expected: token1 token2 token3

Of course the set of legal input symbols created by yyparse (option -e required) depends on the current parsing state. It will contain all tokens listed in the verbose file as legal input symbols to cause a shift or reduce action of the parser.

Legal input symbols tend to hide behind the default parsing actions marked $else in the verbose file (yacc outputs .instead). Myacc cannot include these symbols into the set of expected tokens, but adding some more error tokens into the grammar rules may uncover them.

PREDEFINED

PROCEDURE yyclearin(); PROCEDURE yyerrorok(); PROCEDURE yyreset(); PROCEDURE yyexit (exitcode : INTEGER); PROCEDURE yyshowerror(yyno, yyline, yycol, yybad : INTEGER; yytext : ARRAY OF CHAR; yyexpected : YYTOKSET);

The procedures listed above are available within actions: yyclearin and yyclearok have the same meaning than the corresponding macros of yacc, yyreset resets the initial parser state and yyexit terminates syntax analysis with exitcode returned by yyparse as the result of syntax analysis. yyshowerror is the the default error handling routine of any parser created by myacc.

DIAGNOSTICS

Errors in input file are reported to StdIO.stderr. The number of detected shift/reduce and reduce/reduce-conflicts are issued the same way. A specification of all conflicts can be obtained from the verbose file if option -v is used.

AUTHOR

Werner Stanglow

SEE ALSO

yacc(1), Keywords

With the exception of the language dependent features any description of yacc applies for myacc accordingly. Thus you may refer to the following references for an introduction into usage of myacc:

A. T. Schreiner & H. G. Friedman, Jr.
Introduction to Compiler Construction with UNIX
Prentice-Hall 1985.
A German translation is available as well (Hanser 1985).

Stephen C. Johnson
Yacc: Yet Another Compiler-Compiler
Programmers Workbench (Edition VII)

FILES

Myacc.m2 parser implementation module
Myacc.d parser definition module
Myacc.dy parser definition module (option -r)
Myacc.t parsing tables (option -l)
Myacc.out verbose file (option -v)
Myacc.act temporary
Myacc.dat temporary
Myacc.loc temporary
Myacc.tmp temporary
/usr/local/lib/myaccpar parser skeleton

BUGS

Unfortunately myacc has inherited from yacc not only the capabilities but some bugs as well:

Error messages issued by myacc are intended to be self-explanatory but sometimes they are not.

Ambiguous declarations will not be recognized in any case.

Myacc does not care about whether the types presented in a <typename> -construction are legal or not.

Unterminated actions tend to produce cascades of error messages (the last line will indicate their beginning).

If things go wrong myacc occasionally complains about non existing streams it cannot close. These messages should be ignored.

Option -b should be used only if the input grammar is accepted by myacc without a fatal error message.


Edited by: borchert, last change: 1997/03/10, revision: 1.1, converted to HTML: 1997/04/28

Modula-2 || Compiler & Tools || Library || Search Engine