Oberon || Compiler & Tools || Library || Module Index || Search Engine
This report is to be seen as supplement of the revised Oberon report in respect to Ulm's implementation of Oberon.
Specifications of data representation, alignment and space usage are implementation specific. Thus, modules relying on assumptions about internal representations are nonportable. However, because implementations of low level modules sometimes need some knowledge about internals, they have been included.
The enclosed numbers of the section headings are references to the paragraphs of the revised Oberon report.
The command line invocation of the Oberon compiler is described in oc.
White space consists of blanks (20X), tabs (09X), newlines (0AX), and form feeds (0CX). Source file tabs (outside of string constants) are interpreted (for listing generation) in conformance to the UNIX standard tab setting, i.e. tab positions at 9, 17, 25, ... The line length (after tab expansion and including the terminating 0AX) is limited to 32,768 characters. The last source line does not need to be terminated by a newline.
User defined identifiers are significant in their whole length. The maximal identifier length is limited by the maximal line length.
Comments may be nested (yet undocumented revision of Oberon).
Compile options may be specified at the beginning of a comment:
$ ControllingComment = "(*" { CompileOption } AnyText "*)" . $ CompileOption = WhiteSpace "$" OptionLetter ("+" | "-" | "=") . $ OptionLetter = UpperCaseLetter .Options are boolean values with an initial value of TRUE (if not overridden by oc). Option settings may be nested: "+" sets the option value to TRUE, "-" to FALSE, and "=" restores the old value.
Following option letters are currently implemented:
Integer constants range from -2147483648 (-2^31) to 2147483647 (2^31-1). The type of small integer constants which fits into SHORTINT depends on their usage. Usually they are of the smallest integer type possible but in expressions they adapt the appropriate data type; e.g. 12*12 yields 144, 12 fits into SHORTINT but not 144.
Floating point constants of REAL type range from -1.797693134862314e+308 to 1.797693134862314e+308. The least positive value of type REAL is 2.225073858507201e-308. LONGREAL floating point constants are not yet implemented. Scale factors with leading "D" are accepted and cause the constant to be of type LONGREAL but the constant is represented as REAL. Thus, the extended precision is lost and constant values outside of the REAL range are not accepted.
Character constants in hexadecimal notation range from 0X to 0FFX.
The length of strings is not limited. Strings must not contain newlines (0AX). "" denotes the empty string and consists of the terminating 0X character.
Export marks are not yet implemented. Instead of marking items which are to be exported a textually separated definition is necessary as defined in the first Oberon report.
$ identdef = ident .
Standard functions including some functions exported from SYSTEM are permitted in constant expressions. Exceptions are SYSTEM.ADR, SYSTEM.BIT, SYSTEM.TAS, SYSTEM.UNIXCALL, SYSTEM.UNIXFORK, and SYSTEM.UNIXSIGNAL.
Floating point operations in constant expressions are currently implemented without secure evaluation. Thus, overflow or division by zero are not detected.
type size representation __________________________________________________________ BOOLEAN 1 TRUE is represented as 1 (least significant bit), FALSE as 0. CHAR 1 characters ranges from 0X to 0FFX. SHORTINT 1 is signed and ranges from -128 to 127. INTEGER 4 is signed and ranges from -2147483648 (-2^31) to 2147483647 (2^31-1). LONGINT 4 is represented like INTEGER. REAL 8 is represented in double precision (MC68881). Values range from -1.797693134862314e+308 to 1.797693134862314e+308. Floating point exceptions are disabled on default and can be enabled by usage of the MC68881 system module. LONGREAL 12 is represented in extended precision (internal format of the MC68881). SET 4 MAX(SET) equals 31. {0} is represented as a 4-byte unsigned integer with the most significant bit set. {31} is represented with the least significant bit set. BYTE 1 represents one byte (8 bits).
Records and arrays occupy multiples of 4 bytes. Array and record components are represented without extra packing. Array elements and record components are aligned on 4-byte-boundaries if the size of elements or components is greater or equal to 4 bytes.
Arrays are implemented in row major order. Dynamic arrays consist of a pointer to the array and a dope vector. The dope vector consists of 4-byte integers which equal the values returned by the LEN standard function.
Empty records have zero length. Type tag pointers are added at run time. They precede the record value (offset -4) in case of allocation by NEW or follow (in stack direction) the pointer to the record value in case of variable parameters. Type tags are pointers to following structure:
TypeTag = POINTER TO TypeTagRec; TypeTagRec = RECORD size: LONGINT; (* size of the type *) module: SysModules.Module; typeno: INTEGER; (* of corresponding reference file *) hierlen: LONGINT; (* number of base types *) basetype: ARRAY hierlen OF TypeTag; (* the 2nd basetype extends the 1st etc. *) (* pointer list *) sentinel: LONGINT; END;
size is the size of the type as returned by SIZE and not the size of allocated area for objects of this type. module and typeno allow debuggers to identify the type. Detailed informations are given in SysModules and reffile(5). hierlen gives the number of base types (not counting the type itself) and basetype is an array of type tags of the base types. These informations are followed by a pointer list which is terminated by sentinel (-2147483648 = MIN(INTEGER)).
Pointer lists conform to following grammar:
PointerList = { PtrDescriptor } . PtrDescriptor = SimplePointer | PointerArray | RecordArray | Other . SimplePointer = offset . PointerArray = offset ElementCount . RecordArray = offset ElementCount typeTag . ElementCount = [ numberOfDynDimensions ] count . Other = offset ( TmpPointer | SimpleAddress | AddressArray ) . TmpPointer = TmpPointerSY StartAddr EndAddr . SimpleAddress = SimpleAddressSY AddressArray = AddressArraySY ElementCount . StartAddr = addr . EndAddr = addr .
Different pointer descriptors are recognized by their two least significant bits:
| bit1 bit0| 0 1 ____|_______________|_____________ 0 | SimplePointer| Other 1 | PointerArray | RecordArray
Other types of pointers are determined by a selector (4 bytes):
TmpPointerSY = 0 SimpleAddressSY = 1 AddressArraySY = 2
Type tags are not only generated for record types and arrays but also for procedures and global variables which can be found by the tables of SysModules.
Some additional notes: Temporary pointers are only to be considered if the associated program counter ranges inside [StartAddr..EndAddr). While pointers are guaranteed to point to the beginning of a structure, addresses are free to point inside a structure and they may reference other areas (e.g. global or local data). Base types are encoded as RecordArray with a length of 1 if a record contains private parts. Dynamic arrays are encoded with numberOfDynDimensions = - n for n-dimensional arrays.
Pointer types are represented as 4-byte unsigned integers. NIL is represented as 0. Pointers to records as variable parameters are followed (in stack direction) by a type tag. The Oberon runtime system guarantees pointer values to be initialized to NIL. While the compiler does not generate checks against references through NIL, they are catched by the memory management and lead to segmentation violations.
are represented by a pointer to the starting address of the procedure.
An asterisk following the symbol PROCEDURE is defined to be a hint for the compiler that the procedure is to be usable as parameter and assignable to variables of a compatible procedure type. This implementation accepts these hints but does not require them.
Even exported procedures need a forward declaration when their usage precedes the actual declaration. The compiler requires identical parameter identifiers for forward declarations and exports.
The result type of a procedure cannot be a record or an array.
The relaxed parameter compatibility rule for the formal type ARRAY OF BYTE applies for variable and value parameters.
The actual parameter type must be identical to the type of the corresponding formal parameter in case of variable parameters. Exceptions are records and pointers, where the formal parameter type must be a base type of the actual parameter type. This conforms to unrevised Oberon but revised Oberon restricts the exception to record types.
ASH(x, n) (arithmetic shift) shifts from left to right in case of negative n (x DIV 2^(-n)).
LEN requires a variable identifier as first argument. Thus designators with index expressions are not permitted.
COPY(x, v) guarantees 0X-termination of v.
HALT(x) causes Process.Exit(x) to be called.
ASSERT(assertion: BOOLEAN) has been added in conformance to newer Oberon compilers from ETH Zürich. There exist an optional second parameter (of type INTEGER) which is ignored by this implementation. Assertions, if switched on by the corresponding option, test assertion for being TRUE and generate a run time error otherwise (see RTErrors).
Compilation units are either definitions or modules. The definition is an excerpt of the module and specifies the items visible to all clients importing the module.
$ definition = DEFINITION ident ";" $ [ImportList] Definitions END ident "." . $ Definitions = { CONST { ConstantDeclaration ";" } | $ TYPE { TypeDeclaration ";" } | VAR { VariableDeclaration ";" } } $ { ProcedureHeading ";" } .All declarations of the definition are to be repeated in the module.
The syntax of the import list has been changed to:
$ ImportList = IMPORT import { "," import } ";" $ import = identdef [ ":=" ident ] .
The initialization of the modules depends on SysMain. The default implementation initializes all modules in linkage order. Modules can be initialized (and in later revisions possibly loaded) by Loader if their name is known.
Parameterless procedures exported by modules compiled with -C are inserted into the structure exported by SysCommands during the early startup phase. The lists of SysCommands and SysModules are completed before any modules will be initialized.
All standard procedures and functions of the SYSTEM module as defined in the report (for NS32000) are implemented with the exception of CC. CC cannot be implemented on a MC68020. Note that the return type of SYSTEM.ADR has been changed to SYSTEM.ADDRESS (see below). Following procedures and functions have been added:
TAS(v) is of boolean type and corresponds to the equally named machine instruction (test and set). The operation is undividable and returns the value of the boolean variable v before setting it to TRUE.
UNIXCALL(syscall, d0, d1, arg, ...) returns a BOOLEAN value and allows to code all UNIX system calls (except fork and signal). syscall is a small integer constant which determines the sort of call. Valid system call numbers are exported by Sys. d0 and d1 are variable parameters. Their values are stored into the corresponding data registers of the MC68020. After execution of the system call the register values are copied back to the variable parameters. Typically d0 contains the return value of the system call or the error code (if UNIXCALL returns FALSE). The arguments following should be INTEGER sized and are pushed in reversed order (in C convention) onto the stack. Character strings must be 0X-terminated and given by their address.
UNIXFORK(pid) returns a BOOLEAN value and realizes the fork system call. pid is a variable parameter and gets the process id of the child process (parent process) or 0 (child process). In error case (return of FALSE) the error number is stored into pid.
UNIXSIGNAL(signo, p, old, error) returns a BOOLEAN value and interfaces the signal handling runtime routines. signo must be a valid signal number (see signal(2) for details), p a signal handling Oberon procedure (or 0 for SIG_DFL and 1 for SIG_IGN). Signal handling Oberon procedures are of following type:
SignalHandler = PROCEDURE(signo: INTEGER; sigcode: ARRAY OF BYTE); (* signo: signal number sigcode: additional information (e.g. faulting address in case of segmentation violation) *)An INTEGER argument type is also accepted for convenience of SIG_DFL and SIG_IGN.
old and error are output parameters. old returns the previous reaction on signo and error returns the error number in case of an error (return of FALSE).
These system procedures are not intended to be used outside of the Oberon system library. Events, SysProcess, SysSignals, SysIO and other modules offer more portable interfaces to the UNIX system.
HALT(exitcode) causes an immediate exit of the calling process without any cleanup procedures (i.e. trap to the operating system).
WMOVE(from, to, n) copies n 4-byte words from address from to to. WMOVE is much faster than MOVE.
WCLEAR(ptr, n) zeroes n 4-byte words beginning from the address pointed to by ptr.
A coroutine scheme is offered by the two coroutine primitives CRSPAWN and CRSWITCH. The main difference to the conventional scheme of Modula-2 (NEWPROCESS and TRANSFER) is that coroutines declare themselves to be coroutines. This allows parametrized coroutines.
CRSPAWN(newcr) performs following steps:
The newcr parameter is of type COROUTINE and points to the coroutine structure. The reference to the coroutine structure remains valid as long the coroutine exists. CRSPAWN returns like CRSWITCH if the newly created coroutine resumes execution.
CRSWITCH(dest) switches the currently active coroutine. The parameter dest must be either the result of a previous CRSPAWN call or it must equal Coroutines.main. After switching, Coroutines.current equals dest and Coroutines.source references the previously active coroutine.
Coroutines point to a structure which contains following components:
TYPE Coroutine = POINTER TO CoroutineRec; CoroutineRec = RECORD base, (* current activation record *) top, (* top of stack *) pc: SYSTEM.UNTRACEDADDRESS; (* current program counter *) interrupts: INTEGER; (* # of saved contexts due to interrupts *) END;Coroutines have a type tag which points always to Coroutines.tag. The stack of a coroutine may be examined by following the chain of dynamic links. Relative to base, the next base pointer is at offset 0, and the return address at 4. The chain is terminated by a dynamic link which equals NIL. Static links (in case of nested procedures) are given as implicit parameter at offset 8.
Additional informations are given in Coroutines and Ulm's Coroutine Scheme.
SYSTEM exports INT16, an integer type which ranges between SHORTINT and INTEGER and occupies two bytes in storage. INT16 may be used like other integer types; SHORT converts INT16 values to SHORTINT and LONG converts to INTEGER. INT16 variables or components will not be aligned by the compiler.
To distinguish usual integer values from addresses and for better support by the garbage collector, two address types have been introduced in SYSTEM: ADDRESS and UNTRACEDADDRESS. Both are compatible to each other and to LONGINT but the compiler issues a warning when an address is assigned to LONGINT. Addresses of type ADDRESS (but not of type UNTRACEDADDRESS) are part of pointer lists and recognized by the garbage collector. Copying garbage collectors are required to check the validity of addresses (of type ADDRESS) and to update them if they point to living objects even if they do not point to the beginning of an object. Garbage collectors are not required to decide whether an object lives or not on base of addresses only, i.e. normal pointers are needed to keep an object alive.
UNIX stores command line arguments at the beginning of the main stack and lets the stack pointer point to the number of arguments and the NIL-terminated lists of arguments and environment parameters. The module SysArgs serves as interface and is initialized during the runtime start. SysArgs.argc and SysArgs.envc are set to the number of arguments (including the command name) and the number of environment parameters. SysArgs.argv and SysArgs.environ point to the list of arguments and environment parameters, respectively. These pointers may be modified but not the objects they point to. Oberon has no appropriate data type for pointers to character arrays with unknown length. The problem is fixed by using pointers to array types with sufficient length (to avoid range check faults). But this restricts access to the basic elements (i.e. characters). Access of these arrays in their entirety could cause memory faults.
SysArgs is very system dependent. More portable interfaces are provided by Args, UnixArguments and UnixEnvironment.
The compiler uses the interface of Storage for NEW and SYSTEM.NEW.
Some errors are detected at runtime:
called procedure error number of module RTErrors description _______________________________________________________________ case CaseError attempt to find case failed return NoReturn function does not return any value range RangeError array index out of range typeguard TypeGuardFailure conversion ConversionError SHORT(val) failed assertion FailedAssertion ASSERT(assertion) failedRuntime errors cause the associated procedures of RTErrors to be called. These procedures are expected to raise events which allow to customize runtime error handling (default reaction is termination with core dump: Process.Abort). These procedures must not return.
Oberon || Compiler & Tools || Library || Module Index || Search Engine