Common Lisp Package: MGL-UTIL

Simple utilities, types.

README:

FUNCTION

Public

ADD-CONFUSION-MATRIX (MATRIX RESULT-MATRIX)

Add MATRIX into RESULT-MATRIX.

BACKING-ARRAY (ARRAY)

Return the array in which the contents of ARRAY are stored. For simple arrays, this is always the array itself. The second value is the displacement.

BINARIZE-RANDOMLY (X)

Return 1 with X probability and 0 otherwise.

BINOMIAL-LOG-LIKELIHOOD-RATIO (K1 N1 K2 N2)

See "Accurate Methods for the Statistics of Surprise and Coincidence" by Ted Dunning (http://citeseer.ist.psu.edu/29096.html). All classes must have non-zero counts, that is, K1, N1-K1, K2, N2-K2 are positive integers. To ensure this - and also as kind of prior - add a small number such as 1 to K1, K2 and 2 to N1, N2 before calling.

BREAK-SEQ (FRACTIONS SEQ)

Split SEQ into a number of subsequences. FRACTIONS is either a positive integer or a list of non-negative real numbers. If FRACTIONS is a positive integer then return a list of that many subsequences of equal size (bar rounding errors), else split SEQ into subsequences, where the length of subsequence I is proportional to element I of FRACTIONS: (BREAK-SEQ '(2 3) '(0 1 2 3 4 5 6 7 8 9)) => ((0 1 2 3) (4 5 6 7 8 9))

COMPUTE-FEATURE-DISAMBIGUITIES (DOCUMENTS MAPPER CLASS-FN &KEY (CLASSES (ALL-DOCUMENT-CLASSES DOCUMENTS CLASS-FN)))

Return scored features as an EQUAL hash table whose keys are features of DOCUMENTS and values are their log likelihood ratios. MAPPER takes a function and a document and calls function with features of the document.

COMPUTE-FEATURE-LLRS (DOCUMENTS MAPPER CLASS-FN &KEY (CLASSES (ALL-DOCUMENT-CLASSES DOCUMENTS CLASS-FN)))

Return scored features as an EQUAL hash table whose keys are features of DOCUMENTS and values are their log likelihood ratios. MAPPER takes a function and a document and calls function with features of the document.

CONFUSION-MATRIX-ACCURACY (MATRIX &KEY FILTER)

Return the overall accuracy of the results in MATRIX. It's computed as the number of correctly classified cases (hits) divided by the name of cases. Return the number of hits and the number of cases as the second and third value. If FILTER function is given, then call it with the target and the prediction of the cell. Disregard cell for which FILTER returns NIL. Precision and recall can be easily computed by giving the right filter, although those are provided in separate convenience functions.

CONFUSION-MATRIX-PRECISION (MATRIX PREDICTION)

Return the accuracy over the cases when the classifier said PREDICTION.

CONFUSION-MATRIX-RECALL (MATRIX TARGET)

Return the accuracy over the cases when the correct class is TARGET.

COUNT-FEATURES (DOCUMENTS MAPPER)

Return scored features as an EQUAL hash table whose keys are features of DOCUMENTS and values are counts of occurrences of features. MAPPER takes a function and a document and calls function with features of the document.

ENCODE/BAG-OF-WORDS (DOCUMENT MAPPER FEATURE->INDEX &KEY (KIND BINARY))

Return a sparse vector that represents the encoded DOCUMENT. Get the features of DOCUMENT from MAPPER, convert each feature to an index by FEATURE->INDEX. FEATURE->INDEX may return NIL if the feature is not used. The result is a vector of index/value conses. Indexes are unique within the vector and are in increasing order. Depending on KIND value is calculated differently: for :FREQUENCY it is the number of times the corresponding feature was found in DOCUMENT, for :BINARY it is always 1. :NORMALIZED-FREQUENCY and :NORMALIZED-BINARY are like the unnormalized counterparts except that as the final step values in the assembled sparse vector are normalized to sum to 1.

GAUSSIAN-RANDOM-1

Return a single float of zero mean and unit variance.

INDEX-SCORED-FEATURES (FEATURE-SCORES N &KEY (START 0))

Take scored features as a feature -> score hash table (returned by COUNT-FEATURES or COMPUTE-FEATURE-LLR, for instance) and return a feature -> index hash table that maps the first N (or less) features with the highest scores to distinct dense indices starting from START.

MAKE-N-GRAM-MAPPEE (FUNCTION N)

Make a function of a single argument that's suitable for the function arguments to a mapper function. It calls FUNCTION with every N element.

MAKE-RANDOM-GENERATOR (SEQ)

Return a function that returns elements of VECTOR in random order without end. When there are no more elements, start over with a different random order.

MAKE-SEQ-GENERATOR (VECTOR)

Return a function that returns elements of VECTOR in order without end. When there are no more elements, start over.

MULTINOMIAL-LOG-LIKELIHOOD-RATIO (K1 K2)

See "Accurate Methods for the Statistics of Surprise and Coincidence" by Ted Dunning (http://citeseer.ist.psu.edu/29096.html). K1 is the number of outcomes in each class. K2 is the same in a possibly different process. All elements in K1 and K2 are positive integers. To ensure this - and also as kind of prior - add a small number such as 1 each element in K1 and K2 before calling.

MV-GAUSSIAN-RANDOM (&KEY MEANS COVARIANCES (COVARIANCES-LEFT-SQUARE-ROOT (CHOLESKY (HERMITIAN-MATRIX COVARIANCES))))

Return a column vector of samples from the multivariate normal distribution defined by MEANS (Nx1) and COVARIANCES (NxN). For multiple calls with the same parameter one can pass in COVARIANCES-LEFT-SQUARE-ROOT instead of COVARIANCES.

NSHUFFLE-VECTOR (VECTOR)

Shuffle a vector in place using Fisher-Yates algorithm.

REVERSE-HASH-TABLE (HASH-TABLE &KEY (TEST #'EQL))

Return a hash table that maps from the values of HASH-TABLE back to its keys. HASH-TABLE had better be a bijection.

SPLIT-BODY (BODY)

Return a list of declarations and the rest of BODY.

STRATIFIED-SPLIT (FRACTIONS SEQ &KEY (KEY #'IDENTITY) (TEST #'EQL) RANDOMIZEP)

Similar to BREAK-SEQ, but also makes sure that keys are equally distributed among the paritions. It can be useful for classification tasks to partition the data set while keeping the distribution of classes the same.

Undocumented

ADD-TO-RUNNING-STAT (X STAT)

ALIST->HASH-TABLE (ALIST &REST ARGS)

APPEND1 (LIST OBJ)

AS-COLUMN-VECTOR (A)

ASDF-SYSTEM-RELATIVE-PATHNAME (PATHNAME)

CALL-PERIODIC-FN (N FN &REST ARGS)

CALL-PERIODIC-FN! (N FN &REST ARGS)

CLEAR-RUNNING-STAT (STAT)

FILL! (ALPHA X)

FLT (X)

FLT-VECTOR (&REST ARGS)

GROUP (SEQ N)

HASH-TABLE->ALIST (HASH-TABLE)

HASH-TABLE->VECTOR (HASH-TABLE)

LAST1 (SEQ)

MAKE-FLT-ARRAY (DIMENSIONS &KEY (INITIAL-ELEMENT 0.0d0))

MAX-POSITION (SEQ START END)

POISSON-RANDOM (MEAN)

READ-DOUBLE-FLOAT-ARRAY (ARRAY STREAM)

READ-INDEXED-FEATURES (STREAM)

RUNNING-STAT-MEAN (STAT)

RUNNING-STAT-VARIANCE (STAT)

SCALED-TANH (X)

SECH (X)

SELECT-RANDOM-ELEMENT (SEQ)

SHUFFLE-VECTOR (VECTOR)

SIGMOID (X)

SIGN (X)

SPLIT-PLIST (LIST KEYS)

SUBSEQ* (SEQUENCE START &OPTIONAL END)

SUFFIX-SYMBOL (SYMBOL &REST SUFFIXES)

TO-SCALAR (MATRIX)

TRY-CHANCE (CHANCE)

WRITE-DOUBLE-FLOAT-ARRAY (ARRAY STREAM)

WRITE-INDEXED-FEATURES (FEATURES->INDICES STREAM)

Private

Undocumented

->DESCRIPTION (OBJECT DESCRIPTION)

ALL-DOCUMENT-CLASSES (DOCUMENTS CLASS-FN)

COLLECT-DISTINCT (SEQ &KEY (KEY #'IDENTITY) (TEST #'EQL))

COMPACT-BINARY-FEATURE-VECTOR (FEATURE-VECTOR)

DOCUMENT-FEATURES (DOCUMENT MAPPER)

FORMAT-DESCRIPTION (DESCRIPTION STREAM)

PPRINT-DESCRIPTIONS (CLASS DESCRIPTIONS STREAM)

READ-AS-BYTES (N STREAM)

WRITE-AS-BYTES (INTEGER N STREAM)

MACRO

Public

REPEATEDLY (&BODY BODY)

Like CONSTANTLY but evaluates BODY it for each time.

SPECIAL-CASE (TEST &BODY BODY)

Let the compiler compile BODY for the case when TEST is true and also when it's false. The purpose is to allow different constraints to propagate to the two branches allowing them to be more optimized.

Undocumented

DEFINE-DESCRIPTIONS ((OBJECT CLASS &KEY INHERITP) &BODY DESCRIPTIONS)

DEFINE-SLOTS-NOT-TO-BE-COPIED (CONTEXT CLASS &BODY SLOT-NAMES)

DEFINE-SLOTS-TO-BE-SHALLOW-COPIED (CONTEXT CLASS &BODY SLOT-NAMES)

PUSH-ALL (LIST PLACE)

THE! (&REST ARGS)

WHILE (TEST &BODY BODY)

WITH-COPYING (&BODY BODY)

WITH-GENSYMS (VARS &BODY BODY)

Private

Undocumented

DEFINE-SLOT-NOT-TO-BE-COPIED (CONTEXT CLASS SLOT-NAME)

DEFINE-SLOT-TO-BE-SHALLOW-COPIED (CONTEXT CLASS SLOT-NAME)

WITH-SAFE-PRINTING (&BODY BODY)

WITH-ZERO-ON-UNDERFLOW (&BODY BODY)

GENERIC-FUNCTION

Public

CONFUSION-CLASS-NAME (MATRIX CLASS)

Name of CLASS for presentation purposes.

CONFUSION-MATRIX-CLASSES (MATRIX)

A list of all classes. The default is to collect classes from the counts. This can be overridden if, for instance, some classes are not present in the results.

COPY (CONTEXT OBJECT)

Make a deepish copy of OBJECT in CONTEXT.

COPY-OBJECT-EXTRA-INITARGS (CONTEXT ORIGINAL-OBJECT)

Return a list of

COPY-OBJECT-SLOT (CONTEXT ORIGINAL-OBJECT SLOT-NAME VALUE)

Return the value of the slot in the copied object and T, or NIL as the second value if the slot need not be initialized. The default implementation of COPY-FOR-PCD for classes calls COPY-SLOT-FOR-PCD.

MAP-CONFUSION-MATRIX (FN MATRIX)

Call FN with TARGET, PREDICTION, COUNT paramaters for each cell in the confusion matrix. Cells with a zero count may be ommitted.

READ-WEIGHTS (OBJECT STREAM)

Read the weights of OBJECT from STREAM.

SORT-CONFUSION-CLASSES (MATRIX CLASSES)

Return a list of CLASSES sorted for presentation purposes.

WRITE-WEIGHTS (OBJECT STREAM)

Write the weights of OBJECT to STREAM.

Undocumented

CONFUSION-COUNT (MATRIX TARGET PREDICTION)

SETFCONFUSION-COUNT (COUNT MATRIX TARGET PREDICTION)

SLOT-ACCESSOR

Public

Undocumented

INDEX (OBJECT)

LAST-EVAL (OBJECT)

SETFLAST-EVAL (NEW-VALUE OBJECT)

Private

Undocumented

COUNTS (OBJECT)

FN (OBJECT)

PERIOD (OBJECT)

VARIABLE

Public

Undocumented

*NO-ARRAY-BOUNDS-CHECK*

Private

Undocumented

*MGL-DIR*

CLASS

Public

CONFUSION-MATRIX

A confusion matrix keeps count of classification results. The correct class is called `target' and the output of the classifier is called `prediction'. Classes are compared with EQUAL.

Undocumented

PERIODIC-FN

RUNNING-STAT

CONSTANT

Public

Undocumented

LEAST-NEGATIVE-FLT

LEAST-POSITIVE-FLT

MOST-NEGATIVE-FLT

MOST-POSITIVE-FLT