Common Lisp Package: COM.INFORMATIMAGO.COMMON-LISP.REGEXP.REGEXP-EMACS

NOT COMPLETE YET. This package implement REGEXP in COMMON-LISP, which is interesting because then it's available on any COMMON-LISP platform whether the external C regexp library is available or not, and moreover, it's the same (that is, it's compatible) on all COMMON-LIST platforms. Posix Regexp implemented in Common-Lisp. See specifications at: http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html This is a strict implementation that will work both in clisp (Common-Lisp) and emacs (with cl and pjb-cl Common-Lisp extensions). This implementation is entirely in lisp, contrarily to what regexp packages are available under clisp or emacs. Thus it as the advantage of portability and availability (you don't have to compile or link a lisp system written in some barbarous language, and you get the same regexp features in all programs including this module). License: AGPL3 Copyright Pascal J. Bourguignon 2002 - 2012 This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see <http://www.gnu.org/licenses/>

README:

FUNCTION

Public

MATCH (REGEXP STRING &OPTIONAL START END)

Common-Lisp: This function returns as first value a match structure containing the indices of the start and end of the first match for the regular expression REGEXP in STRING, or nil if there is no match. If START is non-nil, the search starts at that index in STRING. If END is non-nil, only (subseq STRING START END) is considered. The next values are match structures for every '(...)' construct in REGEXP, in the order that the open parentheses appear in REGEXP. start: the first character of STRING to be considered (defaults to 0) end: the after last character of STRING to be considered (defaults to (length string)). RETURN: index of start of first match for REGEXP in STRING, nor nil.

MATCH-END (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

MATCH-START (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

MATCH-STRING (STRING MATCH)

Extracts the substring of STRING corresponding to a given pair of start and end indices. The result is shared with STRING. If you want a freshly consed string, use copy-string or (coerce (match-string ...) 'simple-string).

Undocumented

SETFMATCH-END (NEW-VALUE INSTANCE)

SETFMATCH-START (NEW-VALUE INSTANCE)

REGEXP-QUOTE (STRING)

Private

PJB-RE-ALTERNATIVE-MATCH (NODE)

Matches a alternative. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-ANY-CATEGORY-MATCH (NODE)

Matches a any-category. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-ANY-CHARACTER-MATCH (NODE)

Matches a any-character. That is, anything but a NEW-LINE! RETURNS: nil when no match, or the next unmatched position when there's a match. A period ( '.' ), when used outside a bracket expression, is a BRE that shall match any character in the supported character set except NUL.

PJB-RE-ANY-NOT-CATEGORY-MATCH (NODE)

Matches a any-not-category. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-ANY-NOT-SYNTAX-CLASS-MATCH (NODE)

Matches a any-not-syntax-class. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-ANY-NOT-WORD-CHARACTER-MATCH (NODE)

Matches a any-not-word-character. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-ANY-SYNTAX-CLASS-MATCH (NODE)

Matches a any-syntax-class. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-ANY-WORD-CHARACTER-MATCH (NODE)

Matches a any-word-character. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-CHAR-SET-MATCH (NODE)

Matches a char-set. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-CHARACTER-MATCH (NODE)

Matches a character. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-COLLAPSE-STRINGS (TREE)

RETURNS: A new list where all sequences of characters are collapsed into strings. Signle characters are not collapsed. NOTE: Does not works recursively because recursive sequences are built bottom-up.

PJB-RE-DECORATE-TREE (TREE STRING)

RETURN: A decorated tree that can be used for the matching the string.

PJB-RE-EMPTY-AT-BEGINNING-OF-LINE-MATCH (NODE)

Matches a empty-at-beginning-of-line. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-EMPTY-AT-BEGINNING-OF-STRING-MATCH (NODE)

Matches a empty-at-beginning-of-string. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-EMPTY-AT-BEGINNING-OF-WORD-MATCH (NODE)

Matches a empty-at-beginning-of-word. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-EMPTY-AT-END-OF-LINE-MATCH (NODE)

Matches a empty-at-end-of-line. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-EMPTY-AT-END-OF-STRING-MATCH (NODE)

Matches a empty-at-end-of-string. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-EMPTY-AT-END-OF-WORD-MATCH (NODE)

Matches a empty-at-end-of-word. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-EMPTY-AT-LIMIT-OF-WORD-MATCH (NODE)

Matches a empty-at-limit-of-word. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-EMPTY-AT-POINT-MATCH (NODE)

Matches a empty-at-point. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-EMPTY-NOT-AT-LIMIT-OF-WORD-MATCH (NODE)

Matches a empty-not-at-limit-of-word. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-ERROR-MATCH (NODE)

Matches a error. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-GROUP-MATCH (NODE)

Matches a group. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-INVERSE-CHAR-SET-MATCH (NODE)

Matches a inverse-char-set. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-MAKE-PJB-RE-SYMBOL (KEY EXT)

RETURN: A symbol corresponding to one of the pjb-re-*-{init,match} functions defined here. ext: A string, either "init" or "match". key: A keyword, one of those used in the regexp syntactic trees. NOTE: emacs Common-Lisp ---------------------- ------------ ------------ (symbol-name 'key) ''key'' ''KEY'' (symbol-name :key) '':key'' ''KEY'' (eq 'key 'KEY) nil T URL: <http://www.informatimago.com/local/lisp/HyperSpec/Body/02_cd.htm> <http://www.informatimago.com/local/lisp/HyperSpec/Body/f_intern.htm#intern>

PJB-RE-NON-GREEDY-ONE-OR-MORE-MATCH (NODE)

Matches a non-greedy-one-or-more. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-NON-GREEDY-OPTIONAL-MATCH (NODE)

Matches a non-greedy-optional. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-NON-GREEDY-ZERO-OR-MORE-MATCH (NODE)

Matches a non-greedy-zero-or-more. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-NULL-MATCH (NODE)

Matches a null. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-ONE-OR-MORE-MATCH (NODE)

Matches a one-or-more. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-OPTIONAL-MATCH (NODE)

Matches a optional. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-PARSE-ELEMENT (SC)

DO: Parses a regexp element. RETURNS: A parse tree. element ::= simple . simple element ::= simple '*' . (:zero-or-more simple) element ::= simple '+' . (:one-or-more simple) element ::= simple '?' . (:optional simple) element ::= simple '*?' . (:non-greedy-zero-or-more simple) element ::= simple '+?' . (:non-greedy-one-or-more simple) element ::= simple '??' . (:non-greedy-optional simple) element ::= simple '{' number '}' . (:repeat-exact simple number) element ::= simple '{' number ',' [ number ] '}' . (:repeat-between simple number [number])

PJB-RE-PARSE-REGEXP (SC)

DO: Parses a regexp. RETURNS: A parse tree. NOTE: The result may contain the symbol :error followed by a string. regexp ::= sequence '|' regexp . (:alternative sequence sequence...) regexp ::= sequence . sequence

PJB-RE-PARSE-SEQUENCE (SC)

DO: Parses a regexp sequence. RETURNS: A parse tree. sequence ::= element sequence . (:sequence element element ...) sequence ::= element . element sequence ::= . nil

PJB-RE-PARSE-SIMPLE (SC)

DO: Parses a regexp simple. RETURN: A parse tree. simple ::= '\(' regexp '\)' . (:group regexp) simple ::= '\(?:' regexp '\)' . (:shy-group regexp) simple ::= '\0' |'\1' |'\2' |'\3' | '\4' |'\5' |'\6' |'\7' |'\8' | '\9' . (:reference number) simple ::= regular-character . regular-character simple ::= '.' | '\w' | '\W' | '\sC' | '\SC' | '\cC' | '\CC' . :any-character :any-word-character :any-not-word-character (:any-syntax-class class) (:any-not-syntax-class class) (:any-category category) (:any-not-category category) simple ::= '\=' | '\b' | '\B' | '\<' | '\>' . :empty-at-point NEVER MATCH IN STRING! :empty-at-limit-of-word :empty-not-at-limit-of-word :empty-at-beginning-of-word :empty-at-end-of-word simple ::= '^' | '\`' . :empty-at-beginning-of-line :empty-at-beginning-of-string simple ::= '$' | '\'' . :empty-at-end-of-line :empty-at-end-of-string simple ::= '\$' | '\^' | '\.' | '\*' | '\+' | '\?' | '\[' | '\]' | '\\' . regular-character simple ::= '[' '^' character-set ']' . (:inverse-char-set char-or-char-interval ) simple ::= '[' character-set ']' . (:char-set char-or-char-interval )

PJB-RE-REFERENCE-MATCH (NODE)

Matches a reference. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-REPEAT-BETWEEN-MATCH (NODE)

Matches a repeat-between. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-REPEAT-EXACT-MATCH (NODE)

Matches a repeat-exact. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-SEQUENCE-MATCH (NODE)

Matches a sequence. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-SHY-GROUP-MATCH (NODE)

Matches a shy-group. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-SPLIT-STRING (STRING &OPTIONAL SEPARATORS)

DO: Splits STRING into substrings where there are matches for SEPARATORS. RETURNS: A list of substrings. separators: A regexp matching the sub-string separators. Defaults to "[ ftnrv]+". NOTE: Current implementation only accepts as separators a literal string containing only one character.

PJB-RE-STRING-MATCH (NODE)

Matches a string. RETURNS: nil when no match, or the next unmatched position when there's a match.

PJB-RE-ZERO-OR-MORE-MATCH (NODE)

Matches a zero-or-more. RETURNS: nil when no match, or the next unmatched position when there's a match.

SC-ADVANCE (SC)

PRE: (= p (sc-position sc)) POST: (= (1+ p) (sc-position sc)) RETURN: The character at position 1+p.

SC-CURR-CHAR (SC)

RETURN: The current character, or nil if EOS.

SC-NEXT-CHAR (SC)

RETURN: The next character, or nil if EOS.

SC-POSITION (SC)

RETURN: The current position.

SC-SCAN-TO-CHAR (SC CHAR)

RETURN: the substring of (sc-string sc) starting from current position to the position just before the first character equal to `char' found from this position. PRE: (= p (sc-position sc)) POST: (and (<= p (sc-position sc)) (or (and (< (sc-position sc) (length (sc-string sc))) (char= char (sc-curr-char sc))) (= (sc-position sc) (length (sc-string sc)))) (forall i between p and (1- (sc-position sc)) (char/= char (char (sc-string sc) i))))

SC-STRING (SC)

RETURN: The string being scanned.

Undocumented

COPY-MATCH (INSTANCE)

MAKE-MATCH (&KEY ((START DUM1178) NIL) ((END DUM1179) NIL))

MAKE-SC (STRING)

MATCH-P (OBJECT)

PJB-RE-COLLECT-GROUPS (DEC-TREE &OPTIONAL GROUPS)

PJB-RE-PARSE-WHOLE-REGEXP (SC)

MACRO

Private

Undocumented

PJB-RE-INIT (NODE POSITION)

PJB-RE-MATCH (NODE)

PJB-RE-SLOT-BEGIN (OBJ)

PJB-RE-SLOT-CHILDREN (OBJ)

PJB-RE-SLOT-END (OBJ)

PJB-RE-SLOT-MATCH (OBJ)

PJB-RE-SLOT-NODE (OBJ)

PJB-RE-SLOT-PRIVATE (OBJ)

PJB-RE-SLOT-STRING (OBJ)

PJB-RE-SLOT-TRY (OBJ)

VARIABLE

Private

PJB-RE-NEW-LINE

A new-line.

CLASS

Public

MATCH (REGEXP STRING &OPTIONAL START END)

This structure stores a (start,end) couple specifying the range matched by a group (or the whole regexp).