Common Lisp Package: DOCUTILS.PARSER.TABLES

CALS Table parser

README:

FUNCTION

Public

Undocumented

PARSE-TABLE (PARSER-CLASS-NAME BLOCK)

Private

SCAN-DOWN (TEXT-BLOCK TOP LEFT RIGHT)

Look for the bottom-right corner of the cell, making note of all row boundaries.

SCAN-LEFT (TEXT-BLOCK TOP LEFT BOTTOM RIGHT)

Noting column boundaries, look for the bottom-left corner of the cell. It must line up with the starting point.

SCAN-RIGHT (TEXT-BLOCK TOP LEFT)

Look for the top-right corner of the cell, and make note of all column boundaries ('+').

SCAN-UP (TEXT-BLOCK TOP LEFT BOTTOM)

Noting row boundaries, see if we can return to the starting point.

Undocumented

BOTTOM (TEXT-BLOCK)

CHECK-PARSE-COMPLETE (PARSER)

DO-PARSE-TABLE (PARSER)

GET-2D-BLOCK (TEXT TOP LEFT BOTTOM RIGHT &KEY (STRIP-INDENT T))

MARK-DONE (DONE TOP LEFT BOTTOM RIGHT)

SCAN-CELL (PARSER TOP LEFT)

STRUCTURE-FROM-CELLS (PARSER)

GENERIC-FUNCTION

Private

CREATE-SCANNER (REGEX &KEY CASE-INSENSITIVE-MODE MULTI-LINE-MODE SINGLE-LINE-MODE EXTENDED-MODE DESTRUCTIVE)

Accepts a regular expression - either as a parse-tree or as a string - and returns a scan closure which will scan strings for this regular expression and a list mapping registers to their names (NIL stands for unnamed ones). The "mode" keyboard arguments are equivalent to the imsx modifiers in Perl. If DESTRUCTIVE is not NIL, the function is allowed to destructively modify its first argument (but only if it's a parse tree).

SCAN (REGEX TARGET-STRING &KEY START END REAL-START-POS ((REAL-START-POS *REAL-START-POS*) NIL) (END (LENGTH TARGET-STRING)) (START 0))

Searches TARGET-STRING from START to END and tries to match REGEX. On success returns four values - the start of the match, the end of the match, and two arrays denoting the beginnings and ends of register matches. On failure returns NIL. REGEX can be a string which will be parsed according to Perl syntax, a parse tree, or a pre-compiled scanner created by CREATE-SCANNER. TARGET-STRING will be coerced to a simple string if it isn't one already. The REAL-START-POS parameter should be ignored - it exists only for internal purposes.

SLOT-ACCESSOR

Private

HEAD-BODY-SEP (OBJECT)

Index of head body separator

TEXT-BLOCK (OBJECT)

Block of text to be parser

Undocumented

CELLS (OBJECT)

SETFCELLS (NEW-VALUE OBJECT)

COLSEPS (OBJECT)

SETFCOLSEPS (NEW-VALUE OBJECT)

DONE (OBJECT)

ROWSEPS (OBJECT)

SETFROWSEPS (NEW-VALUE OBJECT)

CLASS

Public

GRID-TABLE-PARSER

Parse a grid table using `parse()`. Here's an example of a grid table:: +------------------------+------------+----------+----------+ | Header row, column 1 | Header 2 | Header 3 | Header 4 | +========================+============+==========+==========+ | body row 1, column 1 | column 2 | column 3 | column 4 | +------------------------+------------+----------+----------+ | body row 2 | Cells may span columns. | +------------------------+------------+---------------------+ | body row 3 | Cells may | - Table cells | +------------------------+ span rows. | - contain | | body row 4 | | - body elements. | +------------------------+------------+---------------------+ Intersections use '+', row separators use '-' (except for one optional head/body row separator, which uses '='), and column separators use '|'. Passing the above table to the `parse()` method will result in the following data structure:: ([24, 12, 10, 10], [[(0, 0, 1, ['Header row, column 1']), (0, 0, 1, ['Header 2']), (0, 0, 1, ['Header 3']), (0, 0, 1, ['Header 4'])]], [[(0, 0, 3, ['body row 1, column 1']), (0, 0, 3, ['column 2']), (0, 0, 3, ['column 3']), (0, 0, 3, ['column 4'])], [(0, 0, 5, ['body row 2']), (0, 2, 5, ['Cells may span columns.']), None, None], [(0, 0, 7, ['body row 3']), (1, 0, 7, ['Cells may', 'span rows.', '']), (1, 1, 7, ['- Table cells', '- contain', '- body elements.']), None], [(0, 0, 9, ['body row 4']), None, None, None]]) The first item is a list containing column widths (colspecs). The second item is a list of head rows, and the third is a list of body rows. Each row contains a list of cells. Each cell is either None (for a cell unused because of another cell's span), or a tuple. A cell tuple contains four items: the number of extra rows used by the cell in a vertical span (morerows); the number of extra columns used by the cell in a horizontal span (morecols); the line offset of the first line of the cell contents; and the cell contents, a list of lines of text.

Private

TABLE-PARSER

Abstract superclass for the common parts of the syntax-specific parsers.

CONDITION

Public

Undocumented

TABLE-CONDITION