Common Lisp Package: HTML5-PARSER

README:

FUNCTION

Public

NODE-TO-XMLS (NODE &OPTIONAL INCLUDE-NAMESPACE-P)

Convert a node into an XMLS-compatible tree of conses, starting at. If the node is a document-fragement a list of XMLS trees is returned.

Undocumented

ELEMENT-ATTRIBUTE (NODE ATTRIBUTE &OPTIONAL NAMESPACE)

SETFELEMENT-ATTRIBUTE (NEW-VALUE NODE ATTRIBUTE &OPTIONAL NAMESPACE)

ELEMENT-MAP-ATTRIBUTES (FUNCTION NODE)

ELEMENT-MAP-CHILDREN (FUNCTION NODE)

MAKE-COMMENT (DATA)

MAKE-DOCTYPE (NAME PUBLIC-ID SYSTEM-ID)

MAKE-DOCUMENT

MAKE-ELEMENT (NAME NAMESPACE)

MAKE-FRAGMENT

MAKE-TEXT-NODE (DATA)

NODE-APPEND-CHILD (NODE CHILD)

NODE-FIRST-CHILD (NODE)

NODE-INSERT-BEFORE (NODE CHILD INSERT-BEFORE)

NODE-LAST-CHILD (NODE)

NODE-NEXT-SIBLING (NODE)

NODE-PREVIOUS-SIBLING (NODE)

NODE-REMOVE-CHILD (NODE CHILD)

PARSE-HTML5 (SOURCE &KEY ENCODING STRICTP CONTAINER)

PARSE-HTML5-FRAGMENT (SOURCE &KEY ENCODING STRICTP (CONTAINER div))

Private

ASCII-ICHAR= (CHAR1 CHAR2)

ASCII case-insensitive char=

ASCII-ISTRING= (STRING1 STRING2)

ASCII case-insensitive string=

CONSUME-NUMBER-ENTITY (SELF IS-HEX)

This function returns either U+FFFD or the character based on the decimal or hexadecimal representation. It also discards ";" if present. If not present a token (:type :parse-error) is emitted.

CREATE-ELEMENT (TOKEN)

Create an element but don't insert it anywhere

ELEMENT-IN-ACTIVE-FORMATTING-ELEMENTS (NAME)

Check if an element exists between the end of the active formatting elements and the last marker. If it does, return it, else return false

EMIT-CURRENT-TOKEN (SELF)

This method is a generic handler for emitting the tags. It also sets the state to :data because that's what's needed after a token has been emitted.

GET-TABLE-MISNESTED-NODEPOSITION

Get the foster parent element, and sibling to insert before (or None) when inserting a misnested table node

HTML5-STREAM-CHARS-UNTIL (STREAM CHARACTERS &OPTIONAL OPPOSITE-P)

Returns a string of characters from the stream up to but not including any character in characters or end of file.

MAP-TOKENS (TOKENIZER FUNCTION)

Return next token or NIL on eof

PARSE-CONTENT-ATTR (STRING)

The algorithm for extracting an encoding from a meta element

PARSER-INSERT-TEXT (DATA &OPTIONAL PARENT)

Insert text data.

PUSH-TOKEN* (SELF TYPE &REST DATA)

Push a token with :type type and :data the a string concatenation of data

Undocumented

ADD-ATTRIBUTE (TOKEN NAME)

ADD-FORMATTING-ELEMENT (TOKEN &KEY (PHASE *PHASE*))

ADD-TO (TOKEN INDICATOR &REST DATA)

ADD-TO-ATTR-NAME (TOKEN &REST DATA)

ADD-TO-ATTR-VALUE (TOKEN &REST DATA)

ADJUST-ATTRIBUTES (TOKEN REPLACEMENTS)

ADJUST-FOREIGN-ATTRIBUTES (TOKEN)

ADJUST-MATH-ML-ATTRIBUTES (TOKEN)

ADJUST-SVG-ATTRIBUTES (TOKEN)

ADJUST-SVG-TAG-NAMES (TOKEN)

CALL-PHASE-METHOD (NAME PHASE TOKEN)

CDATA-SWITCH-HELPER

CHAR-RANGE (CHAR1 CHAR2)

CLEAR-ACTIVE-FORMATTING-ELEMENTS

CONSUME-ENTITY (SELF &KEY ALLOWED-CHAR FROM-ATTRIBUTE)

CONVERT-ENTITIES-LIST (ENTITIES)

CONVERT-TO-TRIE (CHAR-LIST VALUE)

DETECT-BOM (SELF)

DETECT-ENCODING (STREAM OVERRIDE-ENCODING FALLBACK-ENCODING)

DOCUMENT*

ELEMENT-IN-SCOPE (TARGET &OPTIONAL VARIANT)

END-TAG-APPLET-MARQUEE-OBJECT (TOKEN &KEY (PHASE *PHASE*))

END-TAG-BLOCK (TOKEN &KEY (PHASE *PHASE*))

END-TAG-BODY (TOKEN &KEY (PHASE *PHASE*))

END-TAG-BR (TOKEN &KEY (PHASE *PHASE*))

END-TAG-CAPTION (TOKEN &KEY (PHASE *PHASE*))

END-TAG-COL (TOKEN &KEY (PHASE *PHASE*))

END-TAG-COLGROUP (TOKEN &KEY (PHASE *PHASE*))

END-TAG-FORM (TOKEN &KEY (PHASE *PHASE*))

END-TAG-FORMATTING (TOKEN &KEY (PHASE *PHASE*))

END-TAG-FRAMESET (TOKEN &KEY (PHASE *PHASE*))

END-TAG-HEAD (TOKEN &KEY (PHASE *PHASE*))

END-TAG-HEADING (TOKEN &KEY (PHASE *PHASE*))

END-TAG-HTML (TOKEN &KEY (PHASE *PHASE*))

END-TAG-HTML-BODY-BR (TOKEN &KEY (PHASE *PHASE*))

END-TAG-IGNORE (TOKEN &KEY (PHASE *PHASE*))

END-TAG-IMPLY (TOKEN &KEY (PHASE *PHASE*))

END-TAG-IMPLY-HEAD (TOKEN &KEY (PHASE *PHASE*))

END-TAG-LIST-ITEM (TOKEN &KEY (PHASE *PHASE*))

END-TAG-OPTGROUP (TOKEN &KEY (PHASE *PHASE*))

END-TAG-OPTION (TOKEN &KEY (PHASE *PHASE*))

END-TAG-OTHER (TOKEN &KEY (PHASE *PHASE*))

END-TAG-P (TOKEN &KEY (PHASE *PHASE*))

END-TAG-SCRIPT (TOKEN &KEY (PHASE *PHASE*))

END-TAG-SELECT (TOKEN &KEY (PHASE *PHASE*))

END-TAG-TABLE (TOKEN &KEY (PHASE *PHASE*))

END-TAG-TABLE-CELL (TOKEN &KEY (PHASE *PHASE*))

END-TAG-TABLE-ROW-GROUP (TOKEN &KEY (PHASE *PHASE*))

END-TAG-TR (TOKEN &KEY (PHASE *PHASE*))

FIND-ENCODING (ENCODING-NAME)

FLUSH-CHARACTERS

GENERATE-IMPLIED-END-TAGS (&OPTIONAL EXCLUDE)

HTML5-STREAM-CHANGE-ENCODING (STREAM NEW-ENCODING)

HTML5-STREAM-CHAR (STREAM)

HTML5-STREAM-UNGET (STREAM CHAR)

IMPLIED-TAG-TOKEN (NAME &OPTIONAL (TYPE END-TAG))

IMPLIED-TAG-TOKEN/FULL (NAME TYPE &KEY (ATTRIBUTES 'NIL) (SELF-CLOSING NIL))

INSERT-COMMENT (TOKEN &OPTIONAL PARENT)

INSERT-DOCTYPE (TOKEN)

INSERT-ELEMENT (TOKEN)

INSERT-ELEMENT-NORMAL (TOKEN)

INSERT-ELEMENT-TABLE (TOKEN)

INSERT-INTO-TRIE (CHAR-LIST VALUE TRIE)

INSERT-ROOT (TOKEN)

INSERT-TEXT (TOKEN &KEY (PHASE *PHASE*))

IS-HTML-INTEGRATION-POINT (ELEMENT)

IS-MATH-ML-TEXT-INTEGRATION-POINT (ELEMENT)

LAST-OPEN-ELEMENT

MAIN-LOOP

MAKE-ENTITIES-TRIE (ENTITIES)

MAKE-HTML-INPUT-STREAM (SOURCE &KEY OVERRIDE-ENCODING FALLBACK-ENCODING)

MAKE-HTML-TOKENIZER (SOURCE &KEY ENCODING CDATA-SWITCH-HELPER)

NODE-APPEND-CHILD* (NODE CHILD)

NODE-ATTRIBUTES= (NODE1 NODE2)

NODE-CLONE* (NODE)

NODE-COUNT (TREE)

NODE-HAS-CONTENT (NODE)

NODE-INSERT-BEFORE* (NODE CHILD INSERT-BEFORE)

NODE-INSERT-TEXT (NODE DATA &OPTIONAL INSERT-BEFORE)

NODE-NAME-TUPLE (NODE)

NODE-REPARENT-CHILDREN (NODE NEW-PARENT)

NORMALIZE-TOKEN (TOKEN)

ONLY-SPACE-CHARACTERS-P (STRING)

OPEN-CHAR-STREAM (SELF)

OUR-SCAN (CHARS OPPOSITE-P CHUNK &KEY START)

PARSE-HTML5-FROM-SOURCE (SOURCE &KEY CONTAINER ENCODING STRICTP)

PARSE-RC-DATA-RAW-TEXT (TOKEN CONTENT-TYPE)

PARSER-PARSE (SOURCE &KEY FRAGMENT-P ENCODING)

PARSER-PARSE-ERROR (ERROR-CODE &OPTIONAL DATAVARS)

PARSER-RESET

PERROR (ERROR-CODE &REST DATAVARS)

PROCESS-CHARACTERS (TOKEN &KEY (PHASE *PHASE*))

PROCESS-COMMENT (TOKEN &KEY (PHASE *PHASE*))

PROCESS-DOCTYPE (TOKEN &KEY (PHASE *PHASE*))

PROCESS-END-TAG (TOKEN &KEY (PHASE *PHASE*))

PROCESS-ENTITY-IN-ATTRIBUTE (SELF &KEY ALLOWED-CHAR)

PROCESS-EOF (TOKEN &KEY (PHASE *PHASE*))

PROCESS-SPACE-CHARACTERS (TOKEN &KEY (PHASE *PHASE*))

PROCESS-START-TAG (TOKEN &KEY (PHASE *PHASE*))

PROCESS-TOKEN (TOKEN)

PUSH-TOKEN (SELF TOKEN)

READ-CHUNK (STREAM)

RECONSTRUCT-ACTIVE-FORMATTING-ELEMENTS

REPORT-CHARACTER-ERRORS (STREAM DATA)

RESET-INSERTION-MODE

RUN-STATE (TOKENIZER)

START-TAG-A (TOKEN &KEY (PHASE *PHASE*))

START-TAG-APPLET-MARQUEE-OBJECT (TOKEN &KEY (PHASE *PHASE*))

START-TAG-BODY (TOKEN &KEY (PHASE *PHASE*))

START-TAG-BUTTON (TOKEN &KEY (PHASE *PHASE*))

START-TAG-CAPTION (TOKEN &KEY (PHASE *PHASE*))

START-TAG-CLOSE-P (TOKEN &KEY (PHASE *PHASE*))

START-TAG-COL (TOKEN &KEY (PHASE *PHASE*))

START-TAG-COLGROUP (TOKEN &KEY (PHASE *PHASE*))

START-TAG-FORM (TOKEN &KEY (PHASE *PHASE*))

START-TAG-FORMATTING (TOKEN &KEY (PHASE *PHASE*))

START-TAG-FRAME (TOKEN &KEY (PHASE *PHASE*))

START-TAG-FRAMESET (TOKEN &KEY (PHASE *PHASE*))

START-TAG-FROM-HEAD (TOKEN &KEY (PHASE *PHASE*))

START-TAG-HEAD (TOKEN &KEY (PHASE *PHASE*))

START-TAG-HEADING (TOKEN &KEY (PHASE *PHASE*))

START-TAG-HR (TOKEN &KEY (PHASE *PHASE*))

START-TAG-HTML (TOKEN &KEY (PHASE *PHASE*))

START-TAG-I-FRAME (TOKEN &KEY (PHASE *PHASE*))

START-TAG-IMAGE (TOKEN &KEY (PHASE *PHASE*))

START-TAG-IMPLY-TBODY (TOKEN &KEY (PHASE *PHASE*))

START-TAG-INPUT (TOKEN &KEY (PHASE *PHASE*))

START-TAG-IS-INDEX (TOKEN &KEY (PHASE *PHASE*))

START-TAG-LIST-ITEM (TOKEN &KEY (PHASE *PHASE*))

START-TAG-MATH (TOKEN &KEY (PHASE *PHASE*))

START-TAG-META (TOKEN &KEY (PHASE *PHASE*))

START-TAG-MISPLACED (TOKEN &KEY (PHASE *PHASE*))

START-TAG-NO-SCRIPT-NO-FRAMES-STYLE (TOKEN &KEY (PHASE *PHASE*))

START-TAG-NOBR (TOKEN &KEY (PHASE *PHASE*))

START-TAG-NOFRAMES (TOKEN &KEY (PHASE *PHASE*))

START-TAG-OPT (TOKEN &KEY (PHASE *PHASE*))

START-TAG-OPTGROUP (TOKEN &KEY (PHASE *PHASE*))

START-TAG-OPTION (TOKEN &KEY (PHASE *PHASE*))

START-TAG-OTHER (TOKEN &KEY (PHASE *PHASE*))

START-TAG-PARAM-SOURCE (TOKEN &KEY (PHASE *PHASE*))

START-TAG-PLAINTEXT (TOKEN &KEY (PHASE *PHASE*))

START-TAG-PRE-LISTING (TOKEN &KEY (PHASE *PHASE*))

START-TAG-PROCESS-IN-HEAD (TOKEN &KEY (PHASE *PHASE*))

START-TAG-RAWTEXT (TOKEN &KEY (PHASE *PHASE*))

START-TAG-ROW-GROUP (TOKEN &KEY (PHASE *PHASE*))

START-TAG-RP-RT (TOKEN &KEY (PHASE *PHASE*))

START-TAG-SCRIPT (TOKEN &KEY (PHASE *PHASE*))

START-TAG-SELECT (TOKEN &KEY (PHASE *PHASE*))

START-TAG-STYLE-SCRIPT (TOKEN &KEY (PHASE *PHASE*))

START-TAG-SVG (TOKEN &KEY (PHASE *PHASE*))

START-TAG-TABLE (TOKEN &KEY (PHASE *PHASE*))

START-TAG-TABLE-CELL (TOKEN &KEY (PHASE *PHASE*))

START-TAG-TABLE-ELEMENT (TOKEN &KEY (PHASE *PHASE*))

START-TAG-TABLE-OTHER (TOKEN &KEY (PHASE *PHASE*))

START-TAG-TEXTAREA (TOKEN &KEY (PHASE *PHASE*))

START-TAG-TITLE (TOKEN &KEY (PHASE *PHASE*))

START-TAG-TR (TOKEN &KEY (PHASE *PHASE*))

START-TAG-VOID-FORMATTING (TOKEN &KEY (PHASE *PHASE*))

START-TAG-XMP (TOKEN &KEY (PHASE *PHASE*))

MACRO

Private

POP-END (PLACE)

Pop from the end of list

PUSH-END (OBJECT PLACE)

Push to the end of list

Undocumented

DEF (PHASE NAME (&REST SLOTS) &BODY BODY)

DEFINE-PHASE-PROCESS-FUNCTIONS (&BODY DEFS)

DEFSTATE (STATE (&REST SLOTS) &BODY BODY)

HANDLE-ENCODING-ERRORS (STREAM &BODY BODY)

INSERT-ELT-AT (OBJECT INDEX PLACE)

TAGNAME-DISPATCH (PHASE NAME &BODY CASES)

GENERIC-FUNCTION

Private

Undocumented

%ADD-FORMATTING-ELEMENT (PHASE TOKEN)

%END-TAG-APPLET-MARQUEE-OBJECT (PHASE TOKEN)

%END-TAG-BLOCK (PHASE TOKEN)

%END-TAG-BODY (PHASE TOKEN)

%END-TAG-BR (PHASE TOKEN)

%END-TAG-CAPTION (PHASE TOKEN)

%END-TAG-COL (PHASE TOKEN)

%END-TAG-COLGROUP (PHASE TOKEN)

%END-TAG-FORM (PHASE TOKEN)

%END-TAG-FORMATTING (PHASE TOKEN)

%END-TAG-FRAMESET (PHASE TOKEN)

%END-TAG-HEAD (PHASE TOKEN)

%END-TAG-HEADING (PHASE TOKEN)

%END-TAG-HTML (PHASE TOKEN)

%END-TAG-HTML-BODY-BR (PHASE TOKEN)

%END-TAG-IGNORE (PHASE TOKEN)

%END-TAG-IMPLY (PHASE TOKEN)

%END-TAG-IMPLY-HEAD (PHASE TOKEN)

%END-TAG-LIST-ITEM (PHASE TOKEN)

%END-TAG-OPTGROUP (PHASE TOKEN)

%END-TAG-OPTION (PHASE TOKEN)

%END-TAG-OTHER (PHASE TOKEN)

%END-TAG-P (PHASE TOKEN)

%END-TAG-SCRIPT (PHASE TOKEN)

%END-TAG-SELECT (PHASE TOKEN)

%END-TAG-TABLE (PHASE TOKEN)

%END-TAG-TABLE-CELL (PHASE TOKEN)

%END-TAG-TABLE-ROW-GROUP (PHASE TOKEN)

%END-TAG-TR (PHASE TOKEN)

%INSERT-TEXT (PHASE TOKEN)

%PROCESS-CHARACTERS (PHASE TOKEN)

%PROCESS-COMMENT (PHASE TOKEN)

%PROCESS-DOCTYPE (PHASE TOKEN)

%PROCESS-END-TAG (PHASE TOKEN)

%PROCESS-EOF (PHASE TOKEN)

%PROCESS-SPACE-CHARACTERS (PHASE TOKEN)

%PROCESS-START-TAG (PHASE TOKEN)

%START-TAG-A (PHASE TOKEN)

%START-TAG-APPLET-MARQUEE-OBJECT (PHASE TOKEN)

%START-TAG-BODY (PHASE TOKEN)

%START-TAG-BUTTON (PHASE TOKEN)

%START-TAG-CAPTION (PHASE TOKEN)

%START-TAG-CLOSE-P (PHASE TOKEN)

%START-TAG-COL (PHASE TOKEN)

%START-TAG-COLGROUP (PHASE TOKEN)

%START-TAG-FORM (PHASE TOKEN)

%START-TAG-FORMATTING (PHASE TOKEN)

%START-TAG-FRAME (PHASE TOKEN)

%START-TAG-FRAMESET (PHASE TOKEN)

%START-TAG-FROM-HEAD (PHASE TOKEN)

%START-TAG-HEAD (PHASE TOKEN)

%START-TAG-HEADING (PHASE TOKEN)

%START-TAG-HR (PHASE TOKEN)

%START-TAG-HTML (PHASE TOKEN)

%START-TAG-I-FRAME (PHASE TOKEN)

%START-TAG-IMAGE (PHASE TOKEN)

%START-TAG-IMPLY-TBODY (PHASE TOKEN)

%START-TAG-INPUT (PHASE TOKEN)

%START-TAG-IS-INDEX (PHASE TOKEN)

%START-TAG-LIST-ITEM (PHASE TOKEN)

%START-TAG-MATH (PHASE TOKEN)

%START-TAG-META (PHASE TOKEN)

%START-TAG-MISPLACED (PHASE TOKEN)

%START-TAG-NO-SCRIPT-NO-FRAMES-STYLE (PHASE TOKEN)

%START-TAG-NOBR (PHASE TOKEN)

%START-TAG-NOFRAMES (PHASE TOKEN)

%START-TAG-OPT (PHASE TOKEN)

%START-TAG-OPTGROUP (PHASE TOKEN)

%START-TAG-OPTION (PHASE TOKEN)

%START-TAG-OTHER (PHASE TOKEN)

%START-TAG-PARAM-SOURCE (PHASE TOKEN)

%START-TAG-PLAINTEXT (PHASE TOKEN)

%START-TAG-PRE-LISTING (PHASE TOKEN)

%START-TAG-PROCESS-IN-HEAD (PHASE TOKEN)

%START-TAG-RAWTEXT (PHASE TOKEN)

%START-TAG-ROW-GROUP (PHASE TOKEN)

%START-TAG-RP-RT (PHASE TOKEN)

%START-TAG-SCRIPT (PHASE TOKEN)

%START-TAG-SELECT (PHASE TOKEN)

%START-TAG-STYLE-SCRIPT (PHASE TOKEN)

%START-TAG-SVG (PHASE TOKEN)

%START-TAG-TABLE (PHASE TOKEN)

%START-TAG-TABLE-CELL (PHASE TOKEN)

%START-TAG-TABLE-ELEMENT (PHASE TOKEN)

%START-TAG-TABLE-OTHER (PHASE TOKEN)

%START-TAG-TEXTAREA (PHASE TOKEN)

%START-TAG-TITLE (PHASE TOKEN)

%START-TAG-TR (PHASE TOKEN)

%START-TAG-VOID-FORMATTING (PHASE TOKEN)

%START-TAG-XMP (PHASE TOKEN)

RUN-STATE* (TOKENIZER STATE)

SLOT-ACCESSOR

Public

Undocumented

NODE-NAME (OBJECT)

NODE-NAMESPACE (OBJECT)

NODE-PARENT (OBJECT)

NODE-PUBLIC-ID (OBJECT)

NODE-SYSTEM-ID (OBJECT)

NODE-TYPE (OBJECT)

NODE-VALUE (OBJECT)

Private

Undocumented

%NODE-ATTRIBUTES (OBJECT)

SETF%NODE-ATTRIBUTES (NEW-VALUE OBJECT)

%NODE-CHILD-NODES (OBJECT)

SETF%NODE-CHILD-NODES (NEW-VALUE OBJECT)

HTML5-STREAM-ENCODING (OBJECT)

HTML5-STREAM-ERRORS (OBJECT)

SETFHTML5-STREAM-ERRORS (NEW-VALUE OBJECT)

PARSER-PHASE (OBJECT)

SETFPARSER-PHASE (NEW-VALUE OBJECT)

TOKENIZER-STATE (OBJECT)

SETFTOKENIZER-STATE (NEW-VALUE OBJECT)

TOKENIZER-STREAM (OBJECT)

VARIABLE

Private

Undocumented

*DEFAULT-ENCODING*

*ENTITIES*

*ENTITIES-TREE*

*INVALID-UNICODE*

*INVALID-UNICODE-HASH*

*PHASE-INDENT*

+BREAKOUT-ELEMENTS+

+ONLY-SPACE-CHARACTERS-REGEXP+

CLASS

Private

Undocumented

COMMENT-NODE

DOCUMENT

DOCUMENT-FRAGMENT

DOCUMENT-TYPE

ELEMENT

HTML-INPUT-STREAM

HTML-PARSER

HTML-TOKENIZER

NODE

TEXT-NODE