Common Lisp Package: DOCUTILS.PARSER

Library for docutils parsers

README:

FUNCTION

Public

ABS-LINE-OFFSET (STATE-MACHINE)

Return Return line offset of current line, from beginning of file.

ADD-STATES (STATE-MACHINE STATE-CLASSNAMES)

register state classes with this state engine

ADD-TRANSITIONS (STATE TRANSITIONS)

Add a list of transitions to the start of the transition list.

GET-INDENTED (STATE-MACHINE &KEY (UNTIL-BLANK NIL) (STRIP-INDENT T) FIRST-INDENT BLOCK-INDENT (STRIP-TOP FIRST-INDENT))

Return an indented block and info. Extract an indented block where the indent is known for all lines. Starting with the current line, extract the entire text block with at least `indent` indentation (which must be whitespace, except for the first line). :Parameters: - `block-indent`: The number of indent columns/characters if the indent is known for all lines. - first-indent: The indent where the indent is known for the first line and unknown for all other lines. - `until_blank`: Stop collecting at the first blank line if true (1). - `strip_indent`: Strip `indent` characters of indentation if true (1, default). - `strip_top`: Strip blank lines from the beginning of the block. :Return: - the indented block, - its first line offset from BOF, and - whether or not it finished with a blank line. - then indent,

GET-TEXT-BLOCK (STATE-MACHINE &KEY FLUSH-LEFT (START (LINE-OFFSET STATE-MACHINE)))

Return a contiguous block of text. If `flush_left` is true, signal `UnexpectedIndentationError` if an indented line is encountered before the text block ends (with a blank line).

GOTO-LINE (STATE ABS-LINE-OFFSET)

Jump to absolute line offset abs-line-offset, load and return it.

KNOWN-FIRST-INDENT (STATE MATCH)

Handle a known-indent text block (first line's indent known). Extend or override in subclasses. Recursively run the state machine for indented blocks

KNOWN-INDENT (STATE MATCH)

Handle a known-indent text block. Extend or override in subclasses. Recursively run the state machine for indented blocks

MATCH-END (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

MATCH-GROUP (MATCH &OPTIONAL (N NIL))

Return a new subsequence corresponding to match group n of match. If n is not specified returns entire match

MATCH-GROUP-LENGTH (MATCH N)

Return length of the subsequence corresponding to match group n of match

MATCH-REG-ENDS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

MATCH-REG-STARTS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

MATCH-START (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

MATCH-STRING (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

NEXT-LINE-BLANK-P (STATE-MACHINE)

TRUE if the next line is blank or non-existant.

REMOVE-TRANSITION (STATE NAME)

Remove a transition by `name`

Undocumented

ABS-LINE-NUMBER (STATE-MACHINE)

CURRENT-LINE (STATE-MACHINE &OPTIONAL (INDEX (LINE-OFFSET STATE-MACHINE)))

INSERT-LINES (STATE-MACHINE LINES &OPTIONAL (OFFSET (1+ (LINE-OFFSET STATE-MACHINE))))

MATCH (PATTERN STRING &KEY (START 0) (END (LENGTH STRING)))

SETFMATCH-END (NEW-VALUE INSTANCE)

SETFMATCH-REG-ENDS (NEW-VALUE INSTANCE)

SETFMATCH-REG-STARTS (NEW-VALUE INSTANCE)

SETFMATCH-START (NEW-VALUE INSTANCE)

SETFMATCH-STRING (NEW-VALUE INSTANCE)

MATCHES (PATTERN STRING &KEY (START 0) (END (LENGTH STRING)))

NEXT-LINE (STATE-MACHINE &OPTIONAL (N 1))

PREVIOUS-LINE (STATE-MACHINE &OPTIONAL (N 1))

STATE-CORRECTION (&OPTIONAL (LINES 1))

TRANSITION-MATCH (TRANSITION STRING)

TRANSITION-NAME (TRANSITION)

Private

AT-BOF (STATE-MACHINE)

True if the input is at or before beginning-of-file.

AT-EOF (STATE-MACHINE)

True if the input is at or past end-of-file.

GET-STATE (STATE-MACHINE &OPTIONAL NEXT-STATE)

Return new state class object

Undocumented

CHECK-LINE (STATE-MACHINE STATE &OPTIONAL (TRANSITIONS (TRANSITIONS STATE)))

COPY-MATCH (INSTANCE)

SETFCURRENT-STATE (STATE STATE-MACHINE)

MAKE-INDENT-STATE-MACHINE (STATE)

MAKE-KNOWN-INDENT-STATE-MACHINE (STATE)

MAKE-MATCH (&KEY ((START DUM645) 0) ((END DUM646) 0) ((STRING DUM647) NIL) ((REG-STARTS DUM648) NIL) ((REG-ENDS DUM649) NIL))

MATCH-P (OBJECT)

TRANSITION-FUNCTION (TRANSITION)

TRANSITION-NEXT-STATE (TRANSITION)

TRANSITION-PATTERN (TRANSITION)

GENERIC-FUNCTION

Public

APPLY-TRANSITION (STATE TRANSITION MATCH)

Execute transition from state with match

BLANK (STATE MATCH)

Handle blank lines.

BOF (STATE)

Beginning of file transition

EOF (STATE)

End of file transition

INDENT (STATE MATCH)

Handle an indented text block. Extend or override in subclasses. Recursively run the state machine for indented blocks

NO-MATCH (STATE TRANSITIONS)

Called when there is no match from `StateMachine.check_line()`.

NOP (STATE MATCH)

A do nothing transition method.

STATE-MACHINE-RUN (STATE-MACHINE INPUT-LINES &KEY INPUT-OFFSET INITIAL-STATE (INITIAL-STATE (INITIAL-STATE STATE-MACHINE)) (INPUT-OFFSET 0) (INLINER RST-PATTERNS) (DOCUMENT (MAKE-NODE 'DOCUMENT)) (MATCH-TITLES T) NODE (INPUT-OFFSET 0) &ALLOW-OTHER-KEYS)

Run state machine over input lines filling in document

Private

MAKE-NESTED-STATE-MACHINE (STATE &OPTIONAL INITIAL-STATE)

Created a nested state machine to parse nested document structures.

SCAN (REGEX TARGET-STRING &KEY START END REAL-START-POS ((REAL-START-POS *REAL-START-POS*) NIL) (END (LENGTH TARGET-STRING)) (START 0))

Searches TARGET-STRING from START to END and tries to match REGEX. On success returns four values - the start of the match, the end of the match, and two arrays denoting the beginnings and ends of register matches. On failure returns NIL. REGEX can be a string which will be parsed according to Perl syntax, a parse tree, or a pre-compiled scanner created by CREATE-SCANNER. TARGET-STRING will be coerced to a simple string if it isn't one already. The REAL-START-POS parameter should be ignored - it exists only for internal purposes.

SLOT-ACCESSOR

Public

BLANK-FINISH (OBJECT)

Used to keep track of blank lines

SETFBLANK-FINISH (NEW-VALUE OBJECT)

Used to keep track of blank lines

INITIAL-STATE (OBJECT)

the initial state name.

INITIAL-TRANSITIONS (OBJECT)

The initial set of transitions for this state

INPUT-LINES (OBJECT)

vector of input lines (without newlines)

LINE-OFFSET (OBJECT)

Current input line offset from beginning of input-lines.

SETFLINE-OFFSET (NEW-VALUE OBJECT)

Current input line offset from beginning of input-lines.

STATE-MACHINE (OBJECT)

A reference to the controlling StateMachine object.

TRANSITIONS (OBJECT)

List of transitions in order.

SETFTRANSITIONS (NEW-VALUE OBJECT)

List of transitions in order.

Private

CURRENT-STATE (OBJECT)

the current state.

INPUT-OFFSET (OBJECT)

Offset of input-lines from the beginning of the file.

STATES (OBJECT)

a list of allowed state classe names.

SETFSTATES (NEW-VALUE OBJECT)

a list of allowed state classe names.

VARIABLE

Public

+WSP-TRANSITIONS+

Transitons for a wsp state machine

Private

*GOTO-LINE-HOOKS*

A list of functions called when state machine moves to another line. Functions take two functions, the state machine and the absolute line offset

*STATE-CHANGE-HOOKS*

A list of functions called when state is changed. Called with state machine and the new state object

Undocumented

*SCAN-CACHE*

CLASS

Public

MATCH (PATTERN STRING &KEY (START 0) (END (LENGTH STRING)))

Results of a transition match

STATE

State superclass.

STATE-MACHINE (OBJECT)

A finite state machine for text filters using matching functions The input is provided in the form of a list of one-line strings (no newlines) which may be modified. States are subclasses of the `State` class. Transitions consist of regular expression patterns and transition methods, and are defined in each state. The state machine is started with the `run()` method, which returns the results of processing in a list.

WSP-STATE

State superclass specialized for whitespace (blank lines & indents). Use this class with `StateMachineWS`. The transitions 'blank' (for blank lines) and 'indent' (for indented text blocks) are added automatically, before any other transitions. The transition method `blank()` handles blank lines and `indent()` handles nested indented blocks. Indented blocks trigger a new state machine to be created by `indent()` and run. The class of the state machine to be created is in `indent_sm`, and the constructor keyword arguments are in the dictionary `indent_sm_kwargs`.

WSP-STATE-MACHINE

state-machine subclass specialized for whitespace recognition

CONDITION

Public

Undocumented

INSERT-LINES (STATE-MACHINE LINES &OPTIONAL (OFFSET (1+ (LINE-OFFSET STATE-MACHINE))))

UNEXPECTED-INDENTATION