Common Lisp Package: DOCUTILS.PARSER.RST

Restructured text parser for docutils

README:

FUNCTION

Private

CHECK-ATTRIBUTION (INDENTED LINE-OFFSET)

Check for an attribution in the last contiguous block of `indented`. * First line after last blank line must begin with '--' (etc.). * Every line after that must have consistent indentation. Return 2 values: (block quote lines, attribution lines, attribution offset)

CHECK-SUBSECTION (STATE STYLE SOURCE LINENO)

Check for a valid subsection header. When a new section is reached that isn't a subsection of the current section, back up the line count (use ``previous_line(-x)``), then terminate-state-machine so the calling StateMachine can re-examine the title. This will work its way back up the calling chain until the correct section level is reached.

DIRECTIVE (STATE MATCH)

A directive block

DIRECTIVE-ALLOW-SPACES-P (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

DIRECTIVE-ARGUMENTS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

DIRECTIVE-CONTENT-P (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

DIRECTIVE-FUNCTION (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

DIRECTIVE-NAME (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

DIRECTIVE-OPTIONS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ENUM-ARGS (MATCH)

Given a match returned from enum-matcher return the format, sequence,ordinal and text parameters

ENUM-MATCHER (STRING &KEY (START 0) (END (LENGTH STRING)))

pattern matcher for enumerated lists - determins format,type and count

EXPLICIT-LIST (STATE)

Create a nested state machine for a series of explicit markup constructs (including anonymous hyperlink targets).

EXTRACT-EXTENSION-OPTIONS (FIELD-LIST)

Given a field list return an alist of names and values

EXTRACT-NAME-VALUE (LINE)

Return a list of (:name value) from a line of the form name=value

FROM-ROMAN (S)

Convert roman numeral to integer

INSERT-METADATA (METADATA PARENT-NODE)

Helper function can be called to insert field data into a document

INVALID-SPECIALISED-INPUT (STATE MATCH)

Not a compound element member. Abort this state machine.

IS-ENUMERATED-LIST-ITEM (STATE ORDINAL SEQUENCE FORMAT)

Check validity based on the ordinal value and the second line. Return true if the ordinal is valid and the second line is blank, indented, or starts with the next enumerator.

LINE-BLOCK-LINE (STATE MATCH LINENO)

Return one line element of a line_block.

LITERAL-BLOCK (STATE)

Return a list of nodes.

NESTED-PARSE (STATE BLOCK INPUT-OFFSET NODE &KEY (MATCH-TITLES NIL) (STATES *RST-STATE-CLASSES*) INITIAL-STATE)

Create a new StateMachine rooted at `node` and run it over the input `block`.

NOT-QUOTED (&REST EXPR)

Return a parse tree expression for expr not quoted

PARAGRAPH (LINES LINENO)

Return a paragraph node & a boolean: literal-block next?

PARSE-DIRECTIVE-BLOCK (DIRECTIVE STATE INDENTED LINE-OFFSET)

Parse a directive block made up of arguments, keyword options and content

PARSE-INLINE (PATTERNS STRING &KEY LINE (START 0) (END (LENGTH STRING)) (LANGUAGE *LANGUAGE*))

Parse a string for inline patterns. Return a list of inline markup elements and system messages. Patterns is a list of patterns to be applied in turn. Each pattern can either be a symbol naming both the parse-tree synonym and a function or a cons of a regular expression pattern and a function. The pattern functions are called with two arguments, the match corresponding to the regexp match and a list of remaining patterns to be applied recursively. They should return a list of inline elements to be inserted. The :start and :end keyword arguments have their usual meanings.

PARSE-OPTION-MARKER (MATCH)

Return a list of `node.option` and `node.option_argument` objects, parsed from an option marker match.

PARSE-TARGET (TEXT-BLOCK)

Determine the type of reference of a target. Returns two values - :refname and the indirect reference name or - :refuri and the URI

QUOTED (STATE MATCH)

Match consistent quotes on subsequent lines.

QUOTED-PATTERN (&REST EXPR)

Return a parse tree expression for a quoted expression quoted

SECTION (STATE TITLE SOURCE STYLE LINENO)

Check for a valid subsection and create one if it checks out.

SPLIT (REGEX TARGET-STRING &KEY (START 0) (END (LENGTH TARGET-STRING)) LIMIT WITH-REGISTERS-P OMIT-UNMATCHED-P SHAREDP)

Matches REGEX against TARGET-STRING as often as possible and returns a list of the substrings between the matches. If WITH-REGISTERS-P is true, substrings corresponding to matched registers are inserted into the list as well. If OMIT-UNMATCHED-P is true, unmatched registers will simply be left out, otherwise they will show up as NIL. LIMIT limits the number of elements returned - registers aren't counted. If LIMIT is NIL (or 0 which is equivalent), trailing empty strings are removed from the result list. If REGEX matches an empty string the scan is continued one position behind this match. If SHAREDP is true, the substrings may share structure with TARGET-STRING.

TO-ROMAN (N)

convert integer to Roman numeral

Undocumented

ADD-ATTRIBUTES (NODE ATTRIBUTES)

ADD-FIELD (PARENT STATE MATCH)

ADD-TABLE-ROWS (STATE PARENT ARRAY TABLELINE)

ADD-TARGET (TARGETNAME REFURI TARGET LINENO)

ADD-TRANSFORM (TRANSFORM)

ANONYMOUS-TARGET (STATE MATCH)

BARE-LITERAL-BLOCK (STATE MATCH)

BLOCK-QUOTE (STATE INDENTED LINE-OFFSET)

BUILD-TABLE (STATE TABLEDATA TABLELINE)

BUILD-TABLE-ROW (STATE ROWDATA TABLELINE)

CITATION (STATE MATCH)

COMMENT (STATE MATCH)

COPY-DIRECTIVE (INSTANCE)

DEFINITION-LIST-ITEM (STATE TERMLINE)

SETFDIRECTIVE-ALLOW-SPACES-P (NEW-VALUE INSTANCE)

SETFDIRECTIVE-ARGUMENTS (NEW-VALUE INSTANCE)

SETFDIRECTIVE-CONTENT-P (NEW-VALUE INSTANCE)

SETFDIRECTIVE-FUNCTION (NEW-VALUE INSTANCE)

SETFDIRECTIVE-NAME (NEW-VALUE INSTANCE)

SETFDIRECTIVE-OPTIONS (NEW-VALUE INSTANCE)

DIRECTIVE-P (OBJECT)

EMPHASIS (MATCH &REST ATTRIBUTES)

EXPLICIT-CONSTRUCT (STATE MATCH)

FOOTNOTE (STATE MATCH)

FOOTNOTE-REFERENCE (MATCH &REST ATTRIBUTES)

INLINE-TEXT (TEXT LINENO)

INTERNAL-TARGET (MATCH &REST ATTRIBUTES)

INTERPRETED (MATCH &REST ATTRIBUTES)

IS-NIL-STRING (STRING)

IS-REFERENCE (REFERENCE)

ISOLATE-GRID-TABLE (STATE)

ISOLATE-SIMPLE-TABLE (STATE)

LITERAL (MATCH &REST ATTRIBUTES)

MAKE-DIRECTIVE (&KEY ((NAME DUM0) ) ((ARGUMENTS DUM1) NIL) ((ALLOW-SPACES-P DUM2) T) ((OPTIONS DUM3) NIL) ((CONTENT-P DUM4) NIL) ((FUNCTION DUM5) NIL))

MAKE-ENUMERATOR (ORDINAL SEQUENCE FORMAT)

MAKE-IMAGE-NODES (URI ALT HEIGHT WIDTH SCALE ALIGN TARGET CLASS ANGLE)

MAKE-TARGET (TEXT-BLOCK LINENO TARGET-NAME)

MALFORMED-TABLE (STATE BLOCK &OPTIONAL DETAIL)

MATH (MATCH &REST ATTRIBUTES)

NEST-LINE-BLOCK-LINES (LINE-BLOCK)

NEW-SUBSECTION (STATE TITLE LINENO)

OPTION-LIST-ITEM (STATE MATCH)

PARSE-ATTRIBUTION (STATE INDENTED LINE-OFFSET)

PARSE-EXTENSION-OPTIONS (STATE DATALINES)

QUOTED-LITERAL-BLOCK (STATE)

REFERENCE (MATCH &REST ATTRIBUTES)

RFC2822-FIELD (STATE MATCH)

SELECT-RST-TRANSITIONS (&REST REST)

SHORT-OVERLINE (STATE LINENO &OPTIONAL (LINES 1))

STRONG (MATCH &REST ATTRIBUTES)

SUBSTITUTION-DEF (STATE MATCH)

SUBSTITUTION-REFERENCE (MATCH &REST ATTRIBUTES)

TABLE-TOP (STATE ISOLATOR PARSER)

TERM (TERMLINE LINENO)

TOPIC (NAME PARENT TITLE SUBTITLE CLASS CONTENT PARSER &OPTIONAL (NODECLASS 'TOPIC))

UNINDENT-WARNING (STATE NODE-NAME)

URI (MATCH &REST ATTRIBUTES)

MACRO

Public

DEF-DIRECTIVE (NAME (NODEVAR &REST LAMBDA-LIST) &BODY BODY)

Define a directive handler for directive named name. lambda-list is a directive lambda-list as follows lambda-list::= ({var | (var [specializer])}* [&allow-spaces] [&option {var | (var [specializer])}* ] [{{&content {var}}] [{&parsed-content {var} [{kwargs}]] )

DEF-ROLE (NAME (TEXTVAR &REST LAMBDA-LIST) &BODY BODY)

Define a role handler for role with cannonical name name. content and options will come from the role directive. lambda list is as follows lambda-list::= ({var | (var [[specializer] [default]])}* [{{&content {var [[specializer] [default]]}}] )

Private

DEFINE-PARSE-TREE-SYNONYM (NAME PARSE-TREE)

Defines the symbol NAME to be a synonym for the parse tree PARSE-TREE. Both arguments are quoted.

DEFINE-RECURSIVE-ELEMENT-PARSE-TREE (NAME START &OPTIONAL (END START))

Matches an element that is recursive i.e. uses a greedy match

DO-SCANS ((MATCH-START MATCH-END REG-STARTS REG-ENDS REGEX TARGET-STRING &OPTIONAL RESULT-FORM &KEY START END) &BODY BODY &ENVIRONMENT ENV)

Iterates over TARGET-STRING and tries to match REGEX as often as possible evaluating BODY with MATCH-START, MATCH-END, REG-STARTS, and REG-ENDS bound to the four return values of each match in turn. After the last match, returns RESULT-FORM if provided or NIL otherwise. An implicit block named NIL surrounds DO-SCANS; RETURN may be used to terminate the loop immediately. If REGEX matches an empty string the scan is continued one position behind this match. BODY may start with declarations.

Undocumented

DEF-ADMONITION (NAME &OPTIONAL (NODE-CLASS (INTERN (STRING-UPCASE NAME))))

DEF-GENERIC-ROLE (NAME)

DEFINE-INLINE-ELEMENT-PARSE-TREE (NAME START &OPTIONAL (END START) (MIDDLE '(REGISTER (NON-GREEDY-REPETITION 0 NIL EVERYTHING))))

MAKE-INLINE-NODES (TYPE ATTRIBUTES &OPTIONAL CHILDREN)

MAKE-NODE (TYPE &REST ARGUMENTS)

GENERIC-FUNCTION

Private

CREATE-SCANNER (REGEX &KEY CASE-INSENSITIVE-MODE MULTI-LINE-MODE SINGLE-LINE-MODE EXTENDED-MODE DESTRUCTIVE)

Accepts a regular expression - either as a parse-tree or as a string - and returns a scan closure which will scan strings for this regular expression and a list mapping registers to their names (NIL stands for unnamed ones). The "mode" keyboard arguments are equivalent to the imsx modifiers in Perl. If DESTRUCTIVE is not NIL, the function is allowed to destructively modify its first argument (but only if it's a parse tree).

ENUMERATOR (STATE MATCH)

Parse an enumerated list item

GET-DIRECTIVE (NAME DIRECTIVES)

Given a directive name and a directives entity return the directive function. This is implemented as a generic function so that the directives can be stored in a class 'shadowing' the main rst directives hash.

SETFGET-DIRECTIVE (VALUE NAME DIRECTIVES)

Given a directive name and a directives entity return the directive function. This is implemented as a generic function so that the directives can be stored in a class 'shadowing' the main rst directives hash.

INSERT-SUBSECTION (SOURCE PARENT-NODE TITLE)

Called for each subsection to insert appropriate nodes into a parent node. By default inserts a section entity with given title.

PARSE-ROLE (ROLE TEXT &OPTIONAL OPTION-VALUES SUPPLIED-CONTENT)

Apply given role to text, returning a list of markup elements to be inserted in place

RFC2822 (STATE MATCH)

RFC2822-style field list item.

SCAN (REGEX TARGET-STRING &KEY START END REAL-START-POS ((REAL-START-POS *REAL-START-POS*) NIL) (END (LENGTH TARGET-STRING)) (START 0))

Searches TARGET-STRING from START to END and tries to match REGEX. On success returns four values - the start of the match, the end of the match, and two arrays denoting the beginnings and ends of register matches. On failure returns NIL. REGEX can be a string which will be parsed according to Perl syntax, a parse tree, or a pre-compiled scanner created by CREATE-SCANNER. TARGET-STRING will be coerced to a simple string if it isn't one already. The REAL-START-POS parameter should be ignored - it exists only for internal purposes.

SUBSECTIONS (SOURCE)

Return an ordered list of subsection sources for a source

TITLE (SOURCE)

Return the subsection title for a source

Undocumented

ANONYMOUS (STATE MATCH)

BULLET (STATE MATCH)

DOCTEST (STATE MATCH)

EMBEDDED-DIRECTIVE (STATE MATCH)

EXPLICIT-MARKUP (STATE MATCH)

FIELD-MARKER (STATE MATCH)

GRID-TABLE-TOP (STATE MATCH)

INITIAL-QUOTED (STATE MATCH)

LINE (STATE MATCH)

LINE-BLOCK (STATE MATCH)

LIST-ITEM (STATE MATCH)

OPTION-MARKER (STATE MATCH)

PARSE-FIELD-BODY (STATE INDENTED OFFSET NODE)

SIMPLE-TABLE-TOP (STATE MATCH)

TEXT (STATE MATCH)

UNDERLINE (STATE MATCH)

SLOT-ACCESSOR

Private

Undocumented

CONTEXT (OBJECT)

SETFCONTEXT (NEW-VALUE OBJECT)

INITIAL-LINENO (OBJECT)

SETFINITIAL-LINENO (NEW-VALUE OBJECT)

LASTORDINAL (OBJECT)

SETFLASTORDINAL (NEW-VALUE OBJECT)

LIST-FORMAT (OBJECT)

SETFLIST-FORMAT (NEW-VALUE OBJECT)

MATCH-TITLES (OBJECT)

SETFMATCH-TITLES (NEW-VALUE OBJECT)

VARIABLE

Private

*CONTEXT*

Context passed to text state

*DEFAULT-INTERPRETED-ROLE*

The canonical name of the default interpreted role. This role is used when no role is specified for a piece of interpreted text.

*DIRECTIVES*

Mapping of directive types to directive functions which take four arguments, an argument string, an a-list of options, an unparsed content block and a callback to parse the content block

*INTERPRETED-ROLES*

Mapping of roles to role functions which take the interpretate text content and an alist of directive options to be interprated by the role function

*RST-STATE-CLASSES*

set of State classes used with `rst-state-machine

*SECTION-LEVEL*

Current section level - index in styles

*TITLE-STYLES*

List of title styles in order

+ENUM-SCANNER+

Regex for matching against enumerated lists.

+ENUMERATED-LISTS+

List of enumerated list types. For each we have its label, regex fragment, function to convert to ordinal, function to convert from ordinal and the type value to be given in html

Undocumented

*PEP-URL*

*RFC-URL*

*SECTION-BUBBLE-UP-KLUDGE*

+EXPLICIT-REFERENCE-SCANNER+

+EXPLICIT-SUBSTITUTION-SCANNER+

+EXPLICIT-TARGET-SCANNER+

+ROMAN-NUMERAL-MAP+

+RST-TRANSITIONS+

CLOSERS

EXPLICIT-CONSTRUCTS

GRID-TABLE-TOP-PATTERN

OPENERS

RST-PATTERNS

SIMPLE-TABLE-TOP-PATTERN

CLASS

Public

RST-READER

The ReStructured text parser

Private

BODY

Generic classifier of the first line of a block.

BULLET-LIST

Second and subsequent bullet_list list_items.

DEFINITION

Second line of potential definition-list-item.

DEFINITION-LIST

Second and subsequent definition_list_items.

DIRECTIVE (STATE MATCH)

Directive Specification

ENUMERATED-LIST

Second and subsequent enumerated-list listitems.

EXPLICIT

Second and subsequent explicit markup construct.

FIELD-LIST

Second and subsequent field_list fields.

LINE (STATE MATCH)

Second line of over- & underlined section title or transition marker.

LINE-BLOCK (STATE MATCH)

Second and subsequent lines of a line_block.

METADATA-READER

An rst reader which reads only title and docinfo

NESTED-STATE-MACHINE

StateMachine run from within other StateMachine runs, to parse nested document structures.

OPTION-LIST

Second and subsequent field_list fields.

QUOTED-LITERAL-BLOCK (STATE)

Nested parse handler for quoted (unindented) literal blocks. Special-purpose. Not for inclusion in `state_classes`.

RECURSIVE-RST-READER

A reader which will recursively read from an entity and recurse down through subsections, reading them in turn

RFC2822-BODY

RFC2822 headers are only valid as the first constructs in documents. As soon as anything else appears, the `Body` state should take over.

RFC2822-LIST

Second and subsequent field_list fields.

RST-STATE

Associated methods used by all State subclasses.

RST-STATE-MACHINE

reStructuredText's master StateMachine.

SPECIALIZED-BODY

Superclass for second and subsequent compound element members. Compound elements are lists and list-like constructs. All transition methods are disabled (redefined as `invalid_input`). Override individual methods in subclasses to re-enable. For example, once an initial bullet list item, say, is recognized, the `BulletList` subclass takes over, with a "bullet_list" node as its container. Upon encountering the initial bullet list item, `Body.bullet` calls its ``self.nested_list_parse`` (`RSTState.nested_list_parse`), which starts up a nested parsing session with `BulletList` as the initial state. Only the ``bullet`` transition method is enabled in `BulletList`; as long as only bullet list items are encountered, they are parsed and inserted into the container. The first construct which is *not* a bullet list item triggers the `invalid_input` method, which ends the nested parse and closes the container. `BulletList` needs to recognize input that is invalid in the context of a bullet list, which means everything *other than* bullet list items, so it inherits the transition list created in `Body`.

SPECIALIZED-TEXT

Superclass for second and subsequent lines of Text-variants.

SUBSTITUTION-DEF (STATE MATCH)

Parser for the contents of a substitution_definition element.

TEXT (STATE MATCH)

Classifier of second line of a text block. Could be a paragraph, a definition list item, or a title.

Undocumented

CUSTOM-ROLE

EXTENSION-OPTIONS

GENERIC-CUSTOM-ROLE

META-BODY

ROLE

STANDARD-ROLE