-
Notifications
You must be signed in to change notification settings - Fork 3
AST Specification
M2Sharp encodes valid Modula-2 input in memory as an abstract syntax tree (AST).
The AST uses three kinds of nodes.
- an
EMPTYsentinel node - non-terminal nodes
- terminal nodes
The sentinel node EMPTY,
is used to encode the absence of an optional sub-node in non-terminal nodes.
Non-terminal nodes are used to encode non-terminal symbols such as modules, imports, definitions, declarations, statements and expressions.
AST,
DEFMOD,
IMPLIST,
IMPORT,
UNQIMP,
DEFLIST,
CONSTDEF,
TYPEDEF,
PROCDEF,
SUBR,
ENUM,
SET,
ARRAY,
RECORD,
EXTREC,
VRNTREC,
POINTER,
PROCTYPE,
INDEXLIST,
FIELDLISTSEQ,
FIELDLIST,
VFLISTSEQ,
VFLIST,
VARIANTLIST,
VARIANT,
CLABELLIST,
CLABELS,
FTYPELIST,
OPENARRAY,
CONSTP,
VARP,
FPARAMLIST,
FPARAMS,
IMPMOD,
BLOCK,
DECLLIST,
TYPEDECL,
VARDECL,
PROC,
MODDECL,
VSR,
VSFIELD,
EXPORT,
QUALEXP,
STMTSEQ,
ASSIGN,
PCALL,
RETURN,
WITH,
IF,
SWITCH,
LOOP,
WHILE,
REPEAT,
FORTO,
EXIT,
ARGS,
ELSIFSEQ,
ELSIF,
CASELIST,
CASE,
ELEMLIST,
RANGE,
FIELD,
INDEX,
DESIG,
DEREF,
NOT,
AND,
OR,
EQ,
NEQ,
LT,
LTEQ,
GT,
GTEQ,
IN,
NEG,
PLUS,
MINUS,
STAR,
SLASH,
DIV,
MOD,
SETDIFF,
FCALL,
SETVAL.
Terminal nodes are used to encode terminal symbols such as filenames, options, identifiers, integer literals, real number literals, character code and quoted literals.
FILENAME,
OPTIONS,
IDENT,
IDENTLIST,
QUALIDENT,
INTVAL,
REALVAL,
CHRVAL,
QUOTEDVAL.
AST nodes are represented graphically as follows:
Nodes are represented by rectangular boxes. Arrows indicate subnode connections.

The subnodes of terminal nodes are filled in grey.

Labels are represented by rounded boxes.

Labels with multiple connections represent alternatives.

Any AST node may be represented in a serialised format of the form
( nodetype subnode-0 subnode-1 subnode-2 ... subnode-N )
where the actual number of sub-nodes is dependent on the node type.
This form of tree representation is called an S-expression.
The structure of the common AST is described in detail below.
The EMPTY node encodes the absence of an optional sub-node.

emptyNode :=
'(' EMPTY ')'
;
The AST node encodes the root of the syntax tree.

astRootNode :=
'(' AST filename options compilationUnit ')'
;
filename := filenameNode ; /* terminal node */
options := optionsNode ; /* terminal node */
compilationUnit :=
defModuleNode | impModuleNode
;
The DEFMOD node encodes a definition module.

defModuleNode :=
'(' DEFMOD moduleIdent importList definitionList ')'
;
moduleIdent := identNode ; /* terminal node */
importList :=
importListNode | emptyNode
;
definitionList :=
definitionListNode | emptyNode
;
The IMPLIST node encodes the entirety of import directives in a module.

importListNode :=
'(' IMPLIST import+ ')'
;
There are two types of import nodes:

import :=
importNode | unqImportNode
;
The IMPORT node encodes a qualified import directive.

importNode :=
'(' IMPORT identList ')'
;
identList := identListNode ; /* terminal node */
The UNQIMP node encodes an unqualified import directive.

unqImportNode :=
'(' UNQIMP moduleIdent identList ')'
;
The DEFLIST node encodes one or more definitions.

definitionListNode :=
'(' DEFLIST definition+ ')'
;
There are four types of definition nodes:
CONSTDEF,
TYPEDEF,
VARDECL,
PROCDEF.

definition :=
constDefNode | typeDefNode | varDeclNode | ProcDefNode
;
The CONSTDEF node encodes a constant definition.

constDefNode :=
'(' CONSTDEF identNode exprNode ')'
;
The TYPEDEF node encodes a type definition.

typeDefNode :=
'(' TYPEDEF identNode ( typeNode | emptyNode ) ')'
;
The PROCDEF node encodes a procedure definition.

procDefNode :=
'(' PROCDEF identNode formalParamList returnedType ')'
;
formalParamList :=
formalParamListNode | emptyNode
;
returnedType := qualidentNode ; /* terminal node */
There are eleven nodes that may represent the definition part of a type definition or an anonymous type.
IDENT,
QUALIDENT,
SUBR,
ENUM,
SET,
ARRAY,
RECORD,
EXTREC,
VRNTREC,
POINTER,
PROCTYPE.

typeNode :=
identNode | qualidentNode |
subrTypeNode | enumTypeNode | setTypeNode | arrayTypeNode | recTypeNode |
extRecTypeNode | vrntRecTypeNode | pointerTypeNode | procTypeNode
;
Nine nodes may represent the type definition part of a field definition or an array base type.
IDENT,
QUALIDENT,
SUBR,
ENUM,
SET,
ARRAY,
RECORD,
POINTER,
PROCTYPE.

fieldType :=
identNode | qualidentNode | subrTypeNode | enumTypeNode | setTypeNode |
arrayTypeNode | recTypeNode | pointerTypeNode | procTypeNode
;
Three nodes may represent the type definition part of a derived type.

derivedType :=
identNode | qualidentNode | subrTypeNode
;
Three nodes represent the type definition part of a record type.

recordType :=
recTypeNode | extRecTypeNode | vrntRecTypeNode
;
The SUBR node encodes a subrange type definition.

subrTypeNode :=
'(' SUBR lowerBound upperBound subrBaseType ')'
;
lowerBound := exprNode ;
upperBound := exprNode ;
subrBaseType :=
qualidentNode | emptyNode
;
The ENUM node encodes an enumeration type definition.

enumTypeNode :=
'(' ENUM identListNode ')'
;
The SET node encodes a set type definition.

setTypeNode :=
'(' SET countableType ')'
;
countableType :=
qualidentNode | subrTypeNode | enumTypeNode
;
The ARRAY node encodes an array type definition.

arrayTypeNode :=
'(' ARRAY indexTypeListNode arrayBaseType ')'
;
arrayBaseType := fieldType ;
The RECORD node encodes a non-variant non-extensible record type definition.

recTypeNode :=
'(' RECORD fieldListSeqNode ')'
;
The POINTER node encodes a pointer type definition.

pointerTypeNode :=
'(' POINTER typeNode ')'
;
The PROCTYPE node encodes a procedure type definition.

procTypeNode :=
'(' PROCTYPE formalTypeList returnedType ')'
;
formalTypeList :=
formalTypeListNode | emptyNode
;
returnedType :=
qualidentNode | emptyNode
;
The EXTREC node encodes an extensible record type definition.

extRecTypeNode :=
'(' EXTREC recBaseType fieldListSeqNode ')'
;
recBaseType := qualidentNode ; /* terminal node */
The VRNTREC node encodes a variant record type definition.

variantRecTypeNode :=
'(' VRNTREC variantFieldListSeqNode ')'
;
The INDEXLIST node encodes one or more index types within an array type definition.

indexTypeListNode :=
'(' INDEXLIST indexType+ ')'
;
indexType := countableType ;
The FIELDLISTSEQ node encodes a non-variant field list sequence within a record type definition.

fieldListSeqNode :=
'(' FIELDLISTSEQ fieldListNode+ ')'
;
The FIELDLIST node encodes a non-variant field list within a field list sequence

fieldListNode :=
'(' FIELDLIST identListNode fieldType ')'
;
The VFLISTSEQ node encodes a field list sequence within a variant record type definition.

variantFieldListSeqNode :=
'(' VFLISTSEQ ( fieldListNode | variantFieldListNode )+ ')'
;
The VFLIST node encodes a variant field list within a variant record field list sequence.

variantFieldListNode :=
'(' VFLIST caseIdent caseType variantList defaultFieldListSeq ')'
;
caseIdent :=
identNode | emptyNode
;
caseType := qualidentNode ; /* terminal node */
defaultFieldListSeq :=
fieldListSeqNode | emptyNode
;
The VARIANTLIST node encodes a variant list within a variant field list.

variantList :=
'(' VARIANTLIST variantNode+ ')'
;
The VARIANT node encodes a variant within a variant list.

variantNode :=
'(' VARIANT caseLabelListNode fieldListSeqNode ')'
;
The CLABELLIST node encodes a case label list within a variant record definition or case statement.

caseLabelListNode :=
'(' CLABELLIST caseLabelsNode+ ')
;
The CLABELS node encodes start and end labels within a case label list.

caseLabelsNode :=
'(' CLABELS startLabel endLabel ')'
;
startLabel := exprNode ;
endLabel :=
exprNode | emptyNode
;
The FTYPELIST node encodes a formal type list within a procedure type definition.

formalTypeListNode :
'(' FTYPELIST formalType+ ')'
;
formalType :=
simpleFormalType | attrFormalType
;
simpleFormalType :=
typeIdent | openArrayTypeNode
;
attrFormalType :=
constAttrFormalTypeNode | varAttrFormalTypeNode
;
The OPENARRAY node encodes an open array parameter within a formal type list.

openArrayTypeNode :=
'(' OPENARRAY typeIdent ')'
;
typeIdent := qualidentNode ; /* terminal node */
The CONSTP node encodes a CONST parameter within a formal type list.

constAttrFormalTypeNode :=
'(' CONSTP simpleFormalType ')'
;
The VARP node encodes a VAR parameter within a formal type list.

varAttrFormalTypeNode :=
'(' VARP simpleFormalType ')'
;
The FPARAMLIST node encodes a formal parameter list within a procedure type definition or procedure signature.

formalParamListNode :=
'(' FPARAMLIST formalParamsNode+ ')'
;
The FPARAMS node encodes formal parameters within a formal parameter list.

formalParamsNode :=
'(' FPARAMS identListNode formalTypeNode ')'
;
The IMPMOD node encodes an implementation module.

impModuleNode :=
'(' IMPMOD moduleIdent importListNode blockNode ')'
;
The BLOCK node encodes a block within a module or procedure.

blockNode :=
'(' BLOCK declarationList body ')'
;
declarationList :=
declarationListNode | emptyNode
;
body :=
statementSeqNode | emptyNode
;
The DECLLIST node encodes one or more declarations within a block.

declarationListNode :=
'(' DECLLIST declarationNode+ ')'
;
There are five types of declaration nodes:
CONSTDEF,
TYPEDECL,
VARDECL,
PROC,
MODDECL.

declarationNode :=
constDefNode | typeDeclNode | varDeclNode | procDeclNode | modDeclNode
;
The TYPEDECL node encodes a type declaration.

typeDeclNode :=
'(' TYPEDECL identNode ( typeNode | vsrTypeNode ) ')'
;
The VARDECL node encodes a variable or field declaration.

varDeclNode :=
'(' VARDECL identListNode fieldType ')'
;
The PROC node encodes a procedure declaration.

procDeclNode :=
'(' PROC identNode formalParamListNode returnedType blockNode ')'
;
The MODDECL node encodes a local module declaration.

modDeclNode :=
'(' MODDECL moduleIdent importListNode exportList blockNode ')'
;
exportList :=
unqualExportNode | qualExportNode | emptyNode
;
The VSR node encodes a variable size record type declaration.

vsrTypeNode :=
'(' VSR fieldListSeqNode varSizeFieldNode ')'
;
The VSFIELD node encodes the indeterminate field of a variable size record type.

varSizeFieldNode :=
'(' VSFIELD varSizeField determinantField varSizeFieldType ')'
;
varSizeField := identNode ; /* terminal node */
determinantField := identNode ; /* terminal node */
varSizeFieldType := qualidentNode ; /* terminal node */
The EXPORT node encodes an unqualified export directive within a local module declaration.

unqualExportNode :=
'(' EXPORT identListNode ')'
;
The QUALEXP node encodes a qualified export directive within a local module declaration.

qualExportNode :=
'(' QUALEXP identListNode ')'
;
The STMTSEQ node encodes a statement sequence.

statementSeqNode :=
'(' STMTSEQ statementNode+ ')'
;
There are eleven types of statement nodes:
ASSIGN,
PCALL,
RETURN,
WITH,
IF,
SWITCH,
LOOP,
WHILE,
REPEAT,
FORTO,
EXIT.

statementNode :=
assignmentNode | controlStmt | withStmtNode
;
Nine statement nodes represent control statements:
PCALL,
RETURN,
IF,
SWITCH,
LOOP,
WHILE,
REPEAT,
FORTO,
EXIT.

controlStmt :=
procCallNode | returnStmtNode | ifStmtNode | caseStmtNode | loopCtrlStmt
;
Five statement nodes represent loop control statements:
LOOP,
WHILE,
REPEAT,
FORTO,
EXIT.

loopCtrlStatement :=
loopStmtNode | whileStmtNode | repeatStmtNode | ForToStmtNode | exitStmtNode
;
The ASSIGN node encodes an assignment statement.

assignmentNode :=
'(' ASSIGN designator exprNode ')'
;
designator :=
identNode | qualidentNode | derefNode | designatorNode
;
The PCALL node encodes a procedure call statement.

procCallNode :=
'( PCALL designator actualParams ')'
;
actualParams :=
actualParamsNode | emptyNode
;
The RETURN node encodes a RETURN statement.

returnStmtNode :=
'(' RETURN returnValue ')'
;
returnValue :=
exprNode | emptyNode
;
The WITH node encodes a WITH statement.

withStmtNode :=
'(' WITH designator statementSeqNode ')'
;
The IF node encodes an IF statement.

ifStmtNode :=
'(' IF exprNode ifBranch elsifSeq elseBranch ')'
;
ifBranch := statementSeqNode ;
elsifSeq :=
elsifSeqNode | emptyNode
;
elseBranch :=
statementSeqNode | emptyNode
;
The SWITCH node encodes a CASE statement.

caseStmtNode :=
'(' SWITCH designator caseListNode elseBranch ')'
;
The LOOP node encodes a LOOP statement.

loopStmtNode :=
'(' LOOP statementSeqNode ')'
;
The WHILE node encodes a WHILE statement.

whileStmtNode :=
'(' WHILE exprNode statementSeqNode ')'
;
The REPEAT node encodes a REPEAT statement.

repeatStmtNode :=
'(' REPEAT statementSeqNode exprNode ')'
;
The FORTO node encodes a FOR statement.

forToStmtNode :=
'(' FORTO identNode startValue endValue stepValue statementSeqNode ')'
;
startValue : exprNode ;
endValue := expNode ;
stepValue :=
exprNode | emptyNode
;
The EXIT node encodes an EXIT statement.

exitStmtNode :=
'(' EXIT ')'
;
The ARGS node encodes actual parameters in a procedure or function call.

actualParamsNode :=
'(' ARGS exprNode+ ')'
;
The ELSIFSEQ node encodes an ELSIF sequence within an IF statement.

elsifSeqNode :=
'(' ELSIFSEQ elsifNode+ ')'
;
The ELSIF node encodes a single ELSIF branch within an IF statement.

elsifNode :=
'(' ELSIF exprNode statementSeqNode ')'
;
The CASELIST node encodes a case list within a CASE statement.

caseListNode :=
'(' CASELIST caseBranchNode+ ')'
;
The CASE node encodes a case branch within a CASE statement.

caseBranchNode :=
'(' CASE caseLabelListNode statementSeqNode ')'
;
The ELEMLIST node encodes the element list within a set value.

elementListNode :=
'(' ELEMLIST element+ ')'
;
element :=
expr | range
;
The RANGE node encodes a value range.

range :=
'(' RANGE lowerValue upperValue ')'
;
lowerValue := exprNode ;
upperValue := exprNode ;
The FIELD node encodes a record field selector.

fieldSelectorNode :=
'(' FIELD selector ')'
;
selector :=
qualidentNode | designatorNode
;
The INDEX node encodes an array subscript selector.

arrayIndexNode :=
'(' INDEX subscript+ ')'
;
subscript := exprNode ;
The DESIG node encodes a designator.

designatorNode :=
'(' DESIG head tail ')'
;
head :=
qualidentNode | derefNode
;
tail :=
fieldSelectorNode | arrayIndexNode | emptyNode
;
The DEREF node encodes a pointer dereference.

derefNode :=
'(' DEREF pointer ')'
;
pointer :=
qualident | derefNode | designatorNode
;
There are 25 nodes that may represent expressions or sub-expressions:
NOT,
AND,
OR,
EQ,
NEQ,
LT,
LTEQ,
GT,
GTEQ,
IN,
NEG,
PLUS,
MINUS,
STAR,
SLASH,
DIV,
MOD,
SETDIFF,
DESIG,
FCALL,
SETVAL.
INTVAL,
REALVAL,
CHRVAL,
QUOTEDVAL.

expr :=
boolExpr | relationalExpr | arithmeticExpr |
designator | funcCallNode | setValNode | literalValue
;
There are three boolean expression nodes:

boolExpr :=
notNode | andNode | orNode
;
There are seven relational expression nodes:
EQ,
NEQ,
LT,
LTEQ,
GT,
GTEQ,
IN.

relationalExpr :=
eqNode | neqNode | ltNode | ltEqNode | gtNode | gtEqNode | inNode
;
There are eight arithmetic expression nodes:
NEG,
PLUS,
MINUS,
STAR,
SLASH,
DIV,
MOD,
SETDIFF.

arithmeticExpr :=
negNode | plusNode | minusNode | starNode | slashNode |
divNode | modulusNode | setDiffNode
;
There are four literal value expression nodes:
INTVAL,
REALVAL,
CHRVAL,
QUOTEDVAL.

literalValue :=
intValNode | realValNode | chrValNode | quotedValNode
;
The NOT node encodes an expression of the form NOT expr.

notNode :=
'(' NOT right ')'
;
The AND node encodes an expression of the form expr1 AND expr2.

andNode :=
'(' AND left right ')'
;
The OR node encodes an expression of the form expr1 OR expr2.

orNode :=
'(' OR left right ')'
;
The EQ node encodes an expression of the form expr1 = expr2.

eqNode :=
'(' EQ left right ')'
;
left : exprNode ;
The NEQ node encodes an expression of the form expr1 # expr2.

neqNode :=
'(' NEQ left right ')'
;
The LT node encodes an expression of the form expr1 < expr2.

ltNode :=
'(' LT left right ')'
;
The LTEQ node encodes an expression of the form expr1 <= expr2.

ltEqNode :=
'(' LTEQ left right ')'
;
The GT node encodes an expression of the form expr1 > expr2.

gtNode :=
'(' GT left right ')'
;
The GTEQ node encodes an expression of the form expr1 >= expr2.

gtEqNode :=
'(' GTEQ left right ')'
;
The IN node encodes an expression of the form expr1 IN expr2.

inNode :=
'(' IN left right ')'
;
The NEG node encodes an expression of the form - expr.

negNode :=
'(' NEG right ')'
;
right : exprNode ;
The PLUS node encodes an expression of the form expr1 + expr2.

plusNode :=
'(' PLUS left right ')'
;
The MINUS node encodes an expression of the form expr1 - expr2.

minusNode :=
'(' MINUS left right ')'
;
The STAR node encodes an expression of the form expr1 * expr2.

starNode :=
'(' STAR left right ')'
;
The SLASH node encodes an expression of the form expr1 / expr2.

slashNode :=
'(' SLASH left right ')'
;
The DIV node encodes an expression of the form expr1 DIV expr2.

divNode :=
'(' DIV left right ')'
;
The MOD node encodes an expression of the form expr1 MOD expr2.

modulusNode :=
'(' MOD left right ')'
;
The SETDIFF node encodes an expression of the form expr1 \ expr2.

setDiffNode :=
'(' SETDIFF left right ')'
;
The FCALL node encodes a function call expression.

funcCallNode :=
'( FCALL designator actualParams ')'
;
The SETVAL node encodes a set value expression.

setValNode :=
'( SETVAL elementList setTypeIdent ')'
;
elementList :=
actualParams | emptyNode
;
setTypeIdent :=
ident | qualident | emptyNode
;
The FILENAME node encodes the filename of the source file.

filenameNode :=
'(' FILENAME '"' filename '"' ')'
;
(FILENAME "parser.mod")
The OPTIONS node encodes the compiler options used when compiling the source file.

optionsNode :=
'(' OPTIONS ( '"' option-name '"' )+ ')'
;
(OPTIONS "--pim4" "--no-synonyms" "--no-coroutines" ...)
The IDENT node encodes an identifier.

identNode :=
'(' IDENT '"' Ident '"' ')'
;
(IDENT "foobar")
The IDENTLIST node encodes an identifier list.

identListNode :=
'(' IDENTLIST ( '"' Ident '"' )+ ')'
;
(IDENTLIST "foo" "bar" "baz" ...)
The QUALIDENT node encodes the component identifiers of a qualified identifier.

qualidentNode :=
'(' QUALIDENT ( '"' Ident '"' ) ( '"' Ident '"' )+ ')'
;
(QUALIDENT "foo" "bar" ...)
The INTVAL node encodes a whole number value.

intValNode :=
'(' INTVAL ( lexeme | '#' lexeme ) ')'
;
(INTVAL 12345)
(INTVAL #0x7FFF)
The REALVAL node encodes a real number value.

realValNode :=
'(' REALVAL lexeme ')'
;
(REALVAL 1.234)
(REALVAL 5.678e9)
The CHRVAL node encodes a character code value.

chrValNode :=
'(' CHRVAL '#' lexeme ')'
;
(CHRVAL #0u7F)
The QUOTEDVAL node encodes a quoted character or string value.

quotedValNode :=
'(' QUOTEDVAL '"' lexeme '"' ')'
;
(QUOTEDVAL "quoted character or string")
+++
Copyright (C) 2017 Modula-2 Software Foundation