Common Lisp Package: COM.INFORMATIMAGO.RDP

This package implements a simple recursive descent parser. Copyright Pascal Bourguignon 2006 - 2012 This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

README:

FUNCTION

Public

COMPUTE-FIRST-SETS (GRAMMAR)

PRE: The GRAMMAR must be normalized. (ie. containly only SEQ rules) DO: Signals an error if there are duplicates in the first set of a non-terminal. RETURN: A hash-table containing the first-set for each symbol of the grammar. (terminals and non terminals).

COMPUTE-FOLLOW-SETS (GRAMMAR)

PRE: The GRAMMAR must be normalized. (ie. containly only SEQ rules) RETURN: A hash-table containing the follow-set for each non-terminal of the grammar.

FIND-RULE (GRAMMAR NON-TERMINAL)

RETURN: the productions with NON-TERMINAL as left-hand-side as a single ALT production (or just the production if there's only one). PRE: (non-terminal-p non-terminal)

GENERATE-GRAMMAR (NAME &KEY TERMINALS (SCANNER T) (SKIP-SPACES T) START RULES (TARGET-LANGUAGE LISP) (TRACE NIL))

SEE ALSO: The docstring of DEFGRAMMAR. RETURN: A form that defines the grammar object and its parser functions.

GRAMMAR-ALL-NON-TERMINALS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

GRAMMAR-ALL-TERMINALS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

GRAMMAR-NAME (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

GRAMMAR-NAMED (NAME)

Returns the grammar named NAME, or NIL if none.

GRAMMAR-RULES (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

GRAMMAR-SKIP-SPACES (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

GRAMMAR-START (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

GRAMMAR-TERMINALS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

NORMALIZE-GRAMMAR (GRAMMAR)

Return a new normalized grammar parsing the same language as GRAMMAR.

Undocumented

CLEAN-RULES (RULES)

COPY-GRAMMAR (INSTANCE)

FIRST-SET (GRAMMAR SYMBOL)

FOLLOW-SET (GRAMMAR NON-TERMINAL)

SETFGRAMMAR-ALL-NON-TERMINALS (NEW-VALUE INSTANCE)

SETFGRAMMAR-ALL-TERMINALS (NEW-VALUE INSTANCE)

SETFGRAMMAR-NAME (NEW-VALUE INSTANCE)

SETFGRAMMAR-RULES (NEW-VALUE INSTANCE)

SETFGRAMMAR-SKIP-SPACES (NEW-VALUE INSTANCE)

SETFGRAMMAR-START (NEW-VALUE INSTANCE)

SETFGRAMMAR-TERMINALS (NEW-VALUE INSTANCE)

MAKE-GRAMMAR (&KEY ((NAME DUM0) NIL) ((TERMINALS DUM1) NIL) ((START DUM2) NIL) ((RULES DUM3) NIL) ((ALL-TERMINALS DUM4) NIL) ((ALL-NON-TERMINALS DUM5) NIL) ((SCANNER DUM6) T) ((SKIP-SPACES DUM7) T) ((FIRST-FUNCTION DUM8) NIL) ((FOLLOW-FUNCTION DUM9) NIL))

NON-TERMINAL-P (GRAMMAR ITEM)

NULLABLEP (GRAMMAR SENTENCE)

TERMINALP (GRAMMAR ITEM)

Private

COMPUTE-FIRST-FUNCTION (GRAMMAR)

PRE: The GRAMMAR must be normalized. (ie. containly only SEQ rules) RETURN: The first-set function for the grammar symbols.

COMPUTE-FOLLOW-FUNCTION (GRAMMAR &OPTIONAL NON-TERMINALS)

PRE: The GRAMMAR must be normalized. (ie. containly only SEQ rules) NON-TERMINAL: When given, it's the list of non-terminals of the non-normalized grammar which we are interested in. RETURN: The follow-set function for the grammar non-terminals.

FIND-RULES (GRAMMAR NON-TERMINAL)

RETURN: all the produtions with NON-TERMINAL as left-hand-side. PRE: (non-terminal-p non-terminal)

GRAMMAR-FIRST-FUNCTION (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

GRAMMAR-FOLLOW-FUNCTION (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

GRAMMAR-SCANNER (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

NORMALIZE-GRAMMAR-RULES (RULES NON-TERMINALS)

Substitute any sub-expressions in the rhs with a new non-terminal and a new production. Then replace the rep, opt, and alt rules with seq rules and new produtions. Returns the new production set.

SENTENCE-NODE-ATTRIBUTE (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

SENTENCE-NODE-WORD (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

Undocumented

CLEAN (EXPR)

CLEAN-ALT (EXPR)

CLEAN-OPT (EXPR)

CLEAN-REP (EXPR)

CLEAN-SEQ (EXPR)

CLEAN-WITH-ACTION (EXPR)

COMPUTE-ALL-NON-TERMINALS (GRAMMAR)

COMPUTE-ALL-TERMINALS (GRAMMAR)

COMPUTE-FIRST-FOLLOW (GRAMMAR)

COPY-SENTENCE-NODE (INSTANCE)

GEN-TRACE (FNAME FORM TRACE)

SETFGRAMMAR-FIRST-FUNCTION (NEW-VALUE INSTANCE)

SETFGRAMMAR-FOLLOW-FUNCTION (NEW-VALUE INSTANCE)

GRAMMAR-P (OBJECT)

SETFGRAMMAR-SCANNER (NEW-VALUE INSTANCE)

MAKE-NEW-NON-TERMINAL (BASE-NT NON-TERMINALS)

MAKE-SENTENCE-NODE (&KEY ((WORD DUM1349) NIL) ((ATTRIBUTE DUM1350) NIL))

MATCH-BEGINNING (INDEX &OPTIONAL (MATCH-RESULTS *STRING-MATCH-RESULTS*))

MATCH-END (INDEX &OPTIONAL (MATCH-RESULTS *STRING-MATCH-RESULTS*))

MATCH-STRING (INDEX STRING &OPTIONAL (MATCH-RESULTS *STRING-MATCH-RESULTS*))

PREPEND (OLD &REST NEW-LISTS)

PROCESS-SENTENCE (GRAMMAR NT SENTENCE)

REGEXP-COMPILE (REGEXP)

REGEXP-MATCH-ANY (GROUPSP)

REGEXP-QUOTE-EXTENDED (STRING)

SETFSENTENCE-NODE-ATTRIBUTE (NEW-VALUE INSTANCE)

SENTENCE-NODE-P (OBJECT)

SETFSENTENCE-NODE-WORD (NEW-VALUE INSTANCE)

SPLIT-ACTION (RHS)

SPLIT-STRING (STRING REGEXP)

STRING-MATCH (REGEXP STRING &KEY (START 0) (END NIL))

TRACEP (KEYWORD TRACE)

MACRO

Public

DEFGRAMMAR (NAME &KEY TERMINALS (SCANNER T) (SKIP-SPACES T) START RULES (TARGET-LANGUAGE LISP) (TRACE NIL))

DO: This macros generates a simple scanner and recursive decent parser for the language described by this grammar. For each <non-terminal> in the grammar, a function named <name>/PARSE-<non-terminal> is generated, in addition to functions SCAN-<name> and PARSE-<name>. The grammar structure is also generated for run-time in the global special variable <name>. TERMINALS: A liste of couples (name-of-terminals regexp-of-terminal). Note that terminals don't necessarily have a name since they may be written directly in the grammar rules as strings. SCANNER: Can be either: - T A scanner is generated. - NIL No scanner is generated. - a class-name subclass of COM.INFORMATIMAGO.COMMON-LISP.PARSER.SCANNER which is used to get tokens. SKIP-SPACES: When false, the spaces are not eaten between token. It's up to the grammar or the scanner to deal with them. (Only when a scanner is generated with :scanner t). START: The start symbol (non-terminal). RULES: A list of grammar rules. TARGET-LANGUAGE: Specifies the language into which the code is generated, as a keyword. The actions must still be written as lisp expressions, but to generate the language specified. There must be a set of methods specialized on this target-language keyword. TRACE: When true, the parser functions are traced.. SYNTAX: (defgrammar <name> :terminals (( <terminal> "regexp" [ / "regexp" ]) ...) :start <non-terminal> :rules ((--> <non-terminal> <items> ) ...)) <items> ::= | <item> <items> <item> ::= <seq> | <rep> | <alt> | <opt> | <non-terminal> | <literal-terminal> | <terminal> <seq> ::= (SEQ <items> <action>) ; <items> may be empty. <rep> ::= (REP <item> <items> <action>) <opt> ::= (OPT <item> <items> <action>) <alt> ::= (ALT <item> <items>) <action> ::= | :ACTION <forms> <forms> ::= | <form> <forms> <form> ::= form -- any lisp form. <non-terminal> ::= symbol -- any lisp symbol (keywords reserved). <terminal> ::= symbol -- any lisp symbol (keywords reserved). <literal-terminal> ::= string -- any lisp string. SEMANTICS: The terminals are either named terminals listed in the :TERMINALS clause, or literal terminals written directly in the productions as lisp strings. They are matched as-is. An extended regular expression regex(7) may be given that will be matched by the scanner to infer the given terminal. The literal terminals are matched first, the longest first, and with ([A-Za-z0-9]|$) appended to terminals ending in a letter or digit (so that they don't match when part of a longer identifier). Then the regular expressions are matched in the order given. A second regexp can be given for a terminal, which must not be matched, after the first regexp, for the terminal to be recognized. (:terminals ((alpha "[A-Za-z]+" / "[0-9]"))) will recognize "abcd, efgh" as the ALPHA "abcd", but won't recognize "abcd42, efgh" as an ALPHA. :START specifies the start non-terminal symbol. The non-terminal symbols are infered implicitely from the grammar rules. If there are more than one subforms, or an action, the REP and OPT forms take an implicit SEQ: (REP a b c :ACTION f) --> (REP (SEQ a b c :ACTION f)) (OPT a b c :ACTION f) --> (OPT (SEQ a b c :ACTION f)) (REP a :ACTION f) --> (REP (SEQ a :ACTION f)) (OPT a :ACTION f) --> (OPT (SEQ a :ACTION f)) (REP a) --> (REP a) (OPT a) --> (OPT a) Embedded ALT are flattened: (ALT a b (ALT c d) e f) --> (ALT a b c d e f) Actions are executed in a lexical environment where the symbols $1, $2, etc are bound to the results of the subforms. $0 is bound to the list of the results of the subforms. In addition, for each non-terminal the non-terminal symbol is bound to the result of the corresponding subform. When the non-terminal is present several times in the production, a dot and an index number is appended. (--> example (seq thing item "," item (opt "," item :action (list item)) :action (list thing item.1 item.2 $5))) would build a list with the results of parsing thing, the two items, and the optional part. The action for REP is to return a possibly empty list of the results of its single subform repeated. (REP is 0 or more). The action for OPT is to return either NIL, or the result of its single subform unchanged. The action for an ALT is to return the result of the selected alternative unchanged. The default action for an internal SEQ is to return the list of the results of its subforms. The default action for an implicit SEQ for a given <non-terminal> lhs is to return the list of the results of its subforms prefixed by the <non-terminal> symbol. TODO: We could also flatten sequences without action, or even sequences with actions with renumbering.

Private

APPENDF (PLACE &REST ARGS &ENVIRONMENT ENV)

Append onto list

PREPENDF (PLACE &REST ARGS &ENVIRONMENT ENV)

Prepend onto list

Undocumented

WITH-NON-TERMINAL (NON-TERMINAL &BODY BODY)

GENERIC-FUNCTION

Public

SCAN-NEXT-TOKEN (SCANNER &OPTIONAL PARSER-DATA)

DO: Scans a new token and store it into (scanner-current-token scanner) PARSER-DATA: Some parsers give information to the scanner. RETURN: (scanner-current-token scanner).

SKIP-SPACES (SCANNER)

DO: Skips over the spaces in the input stream. Updates line and column slots. RETURN: line; column

Undocumented

ACCEPT (SCANNER TOKEN)

ADVANCE-LINE (SCANNER)

PARSER-ERROR-COLUMN (CONDITION)

PARSER-ERROR-FORMAT-ARGUMENTS (CONDITION)

PARSER-ERROR-FORMAT-CONTROL (CONDITION)

PARSER-ERROR-GRAMMAR (CONDITION)

PARSER-ERROR-LINE (CONDITION)

PARSER-ERROR-NON-TERMINAL-STACK (CONDITION)

PARSER-ERROR-SCANNER (CONDITION)

SCANNER-END-OF-SOURCE-P (SCANNER)

UNEXPECTED-TOKEN-ERROR-EXPECTED-TOKEN (CONDITION)

UNEXPECTED-TOKEN-ERROR-NON-TERMINAL-STACK (CONDITION)

WORD-EQUAL (A B)

Private

GENERATE-BOILERPLATE (TARGET-LANGUAGE GRAMMAR &KEY TRACE)

Generate the boilerplate code needed by the scanner and parser. This code must be a single lisp form. In the case of the :lisp target-language, this form is the code of the boilerplate itself. For another language, this form is lisp code used to generate that other language boilerplate.

GENERATE-NT-PARSER (TARGET-LANGUAGE GRAMMAR NON-TERMINAL &KEY TRACE (TRACE NIL))

Generate the parser code for the given non-terminal. This code must be a single lisp form. In the case of the :lisp target-language, this form is the code of the boilerplate itself. For another language, this form is lisp code used to generate that other language boilerplate.

GENERATE-PARSER (TARGET-LANGUAGE GRAMMAR &KEY TRACE (TRACE NIL))

Generate the toplevel parser code. This code must be a single lisp form. In the case of the :lisp target-language, this form is the code of the boilerplate itself. For another language, this form is lisp code used to generate that other language boilerplate.

GENERATE-SCANNER (TARGET-LANGUAGE GRAMMAR &KEY TRACE (TRACE NIL))

Generate the scanner code, when (grammar-scanner grammar) is T. (grammar-scanner grammar) may be the class-name of a scanner to use. This code must be a single lisp form. In the case of the :lisp target-language, this form is the code of the boilerplate itself. For another language, this form is lisp code used to generate that other language boilerplate.

Undocumented

GEN-IN-FIRSTS (TARGET FIRSTS)

GEN-PARSE-FUNCTION-NAME (TARGET GRAMMAR NON-TERMINAL)

GEN-PARSING-STATEMENT (TARGET GRAMMAR ITEM)

GEN-SCANNER-CLASS-NAME (TARGET GRAMMAR)

GEN-SCANNER-FUNCTION-NAME (TARGET GRAMMAR)

SCANNER-END-OF-LINE-P (SCANNER)

SLOT-ACCESSOR

Public

SCANNER-COLUMN (SCANNER)

The The number of the current column.

SETFSCANNER-COLUMN (NEW-VALUE OBJECT)

Set the The number of the current column.

SCANNER-CURRENT-TOKEN (SCANNER)

The The last token read.

SETFSCANNER-CURRENT-TOKEN (NEW-VALUE OBJECT)

Set the The last token read.

SCANNER-LINE (SCANNER)

The The number of the current line.

SETFSCANNER-LINE (NEW-VALUE OBJECT)

Set the The number of the current line.

SCANNER-SPACES (SCANNER)

The A string containing the characters considered space by SKIP-SPACES.

SETFSCANNER-SPACES (NEW-VALUE OBJECT)

Set the A string containing the characters considered space by SKIP-SPACES.

SCANNER-STATE (SCANNER)

The The state of the scanner.

SETFSCANNER-STATE (NEW-VALUE OBJECT)

Set the The state of the scanner.

SCANNER-TAB-WIDTH (SCANNER)

The TAB aligns to column number modulo TAB-WIDTH.

SETFSCANNER-TAB-WIDTH (NEW-VALUE OBJECT)

Set the TAB aligns to column number modulo TAB-WIDTH.

TOKEN-COLUMN (TOKEN)

The Returns the column of the first character of the token.

SETFTOKEN-COLUMN (NEW-VALUE OBJECT)

Set the Returns the column of the first character of the token.

TOKEN-KIND (TOKEN)

The Returns the kind of the token.

SETFTOKEN-KIND (NEW-VALUE OBJECT)

Set the Returns the kind of the token.

TOKEN-LINE (TOKEN)

The Returns the line where the token was found.

SETFTOKEN-LINE (NEW-VALUE OBJECT)

Set the Returns the line where the token was found.

TOKEN-TEXT (TOKEN)

The Returns the literal text the token.

SETFTOKEN-TEXT (NEW-VALUE OBJECT)

Set the Returns the literal text the token.

Undocumented

SCANNER-BUFFER (OBJECT)

SETFSCANNER-BUFFER (NEW-VALUE OBJECT)

SCANNER-CURRENT-TEXT (OBJECT)

SETFSCANNER-CURRENT-TEXT (NEW-VALUE OBJECT)

VARIABLE

Public

*NON-TERMINAL-STACK*

For error reporting.

Private

*EOF-SYMBOL*

The symbol used to denote the End-Of-Source in the follow-sets.

*GRAMMARS*

Records the variables defined with DEFGRAMMAR. Use (GRAMMAR-NAMED name) to look up a grammar.

Undocumented

*LINENUM*

*SPACES*

*STRING-MATCH-RESULTS*

CLASS

Public

TOKEN

A syntactic element.

Undocumented

GRAMMAR

RDP-SCANNER

Private

Undocumented

SENTENCE-NODE

CONDITION

Public

Undocumented

PARSER-END-OF-SOURCE-NOT-REACHED

PARSER-ERROR

UNEXPECTED-TOKEN-ERROR