Common Lisp Package: MGL-TRAIN

Generic training related interfaces and basic definitions. The three most important concepts are SAMPLERs, TRAINERs and LEARNERs.

README:

FUNCTION

Public

APPLY-COUNTERS-AND-MEASURERS (COUNTERS-AND-MEASURERS &REST ARGS)

Add the errors measured by the measurers to the counters.

COLLECT-BATCH-ERRORS (FN SAMPLER LEARNER COUNTERS-AND-MEASURERS)

Sample from SAMPLER until it runs out. Call FN with each batch of samples. COUNTERS-AND-MEASURERS is a sequence of conses of a counter and function. The function takes one parameter: a sequence of samples and is called after each call to FN. Measurers return two values: the cumulative error and the counter, suitable as the second and third argument to ADD-ERROR. Finally, return the counters. Return the list of counters from COUNTERS-AND-MEASURERS.

MAP-BATCHES-FOR-LEARNER (FN SAMPLER LEARNER)

Call FN with batches of samples suitable for LEARNER. The number of samples in a batch is MAX-N-STRIPES of LEARNER or less if SAMPLER runs out.

ROC-AUC (SEQ PRED &KEY (KEY #'IDENTITY))

Return the area under the ROC curve for the dataset represented by SEQ. PRED is a predicate function for deciding whether an element of SEQ belongs to the class in question. KEY returns the a number for each element which is the predictor's idea of how much that element is likely to belong to the class, it's not necessarily a probability.

SAMPLE-BATCH (SAMPLER MAX-SIZE)

Return a sequence of samples of length at most MAX-SIZE or less if SAMPLER runs out.

SEGMENT-SET->WEIGHTS (SEGMENT-SET WEIGHTS)

Copy the values from SEGMENT-SET to WEIGHTS.

SEGMENT-SET<-WEIGHTS (SEGMENT-SET WEIGHTS)

Copy the values of WEIGHTS to SEGMENT-SET.

Undocumented

INSERT-INTO-EXECUTOR-CACHE (KEY CACHE VALUE)

LABELEDP (OBJECT)

LIST-SEGMENTS (SEGMENTABLE)

LOOKUP-EXECUTOR-CACHE (KEY CACHE)

SEGMENT-SIZE (SEGMENT)

TRIVIALLY-MAP-OVER-EXECUTORS (FN SAMPLES OBJ)

Private

COLLECT-CLASSIFICATION-CONFIDENCES (EXAMPLES STRIPED &KEY (CONFIDENCE-FN #'CLASSIFICATION-CONFIDENCES))

Return the sequence of prediction confidences for EXAMPLES as measured on STRIPED when number of classification errors and the number of examples. The length of EXAMPLES must be equal to the number of stripes in STRIPED. LABEL-FN takes an example and returns its label that compared by EQL to what STRIPE-LABEL-FN returns for STRIPED and the index of the stripe. This is a measurer function.

COUNT-MISCLASSIFICATIONS (EXAMPLES STRIPED &KEY (LABEL-FN #'LABEL) (STRIPE-LABEL-FN #'STRIPE-LABEL))

Return the number of classification errors and the number of examples. The length of EXAMPLES must be equal to the number of stripes in STRIPED. LABEL-FN takes an example and returns its label that compared by EQL to what STRIPE-LABEL-FN returns for STRIPED and the index of the stripe. This is a measurer function.

MEASURE-CROSS-ENTROPY (EXAMPLES STRIPED &KEY (LABEL-FN #'LABEL) (LABEL-DISTRIBUTION-FN #'LABEL-DISTRIBUTION) (CONFIDENCE-FN #'CLASSIFICATION-CONFIDENCES))

Return the sum of the cross entropy between the confidences and the distribution (1 at the label of the class) and the number of examples. The length of EXAMPLES must be equal to the number of stripes in STRIPED. LABEL-FN takes an example and returns its label. This is a measurer function.

Undocumented

ADD-MEASURED-ERROR (COUNTER-AND-MEASURER &REST ARGS)

MAKE-CHUNK-RECONSTRUCTION-ROC-AUC-COUNTERS-AND-MEASURERS (CHUNKS &KEY CHUNK-FILTER CLASS-LABEL CLASS-INDEX)

STRIPE-BINDING (STRIPE OBJECT START &OPTIONAL END)

TEST-ROC-AUC

MACRO

Public

DO-BATCHES-FOR-LEARNER ((SAMPLES (SAMPLER LEARNER)) &BODY BODY)

Convenience macro over MAP-BATCHES-FOR-LEARNER.

DO-SEGMENT-SET ((SEGMENT &KEY START-IN-SEGMENT-SET) SEGMENT-SET &BODY BODY)

Iterate over SEGMENTS in SEGMENT-SET ....

WITH-STRIPES (SPECS &BODY BODY)

Bind start and optionally end indices of belonging to stripes in striped objects. (WITH-STRIPE ((STRIPE1 OBJECT1 START1 END1) (STRIPE2 OBJECT2 START2 END2) ...) ...)

Undocumented

DO-EXECUTORS ((SAMPLES OBJECT) &BODY BODY)

WITH-SEGMENT-WEIGHTS (((WEIGHTS START END) SEGMENT) &BODY BODY)

GENERIC-FUNCTION

Public

ADD-ERROR (COUNTER ERR N)

Add ERR to SUM-ERROR and N to N-SUM-ERRORS.

FIND-ONE-EXECUTOR (SAMPLE OBJ)

Called by TRIVIALLY-MAP-OVER-EXECUTORS.

FINISHEDP (SAMPLER)

See if SAMPLER has run out of examples.

INITIALIZE-TRAINER (TRAINER LEARNER)

To be called before training starts this function sets up TRAINER to be suitable for LEARNER. Normally called automatically from a :BEFORE method on TRAIN.

LABEL (OBJECT)

Return the label of object as an index. This is a special case of LABEL-DISTRIBUTION.

LABEL-DISTRIBUTION (STRIPED STRIPE OBJECT)

Return an FLT-VECTOR that represent our knowledge of the distribution of the true label of OBJECT.

MAP-OVER-EXECUTORS (FN SAMPLES OBJECT)

Divide SAMPLES between executors. And call FN with the samples and the executor for which the samples are. Some objects conflate function and call: the forward pass of a bpn computes output from inputs so it is like a function but it also doubles as a function call in the sense that the bpn (function) object changes state during the computation of the output. Hence not even the forward pass of a bpn is thread safe. There is also the restriction that all inputs must be of the same size. For example, if we have a function that builds bpn for an input of a certain size, then we can create a factory that creates bpns for a particular call. The factory probably wants keep the weights the same though. Another possibility MAP-OVER-EXECUTORS allows is to parallelize execution. The default implementation simply calls FN with SAMPLES and OBJECT.

MAP-SEGMENT-RUNS (FN SEGMENT)

Call FN with start and end of intervals of consecutive indices that are not missing in SEGMENT. Called by trainers that support partial updates.

MAP-SEGMENTS (FN SEGMENTABLE)

Apply FN to each segment of LEARNER.

MAYBE-MAKE-CROSS-ENTROPY-MEASURER (OBJ)

Return a function of one parameter that is invoked when OBJ has the predicted label(s) computed and it measures cross entropy error. Return NIL if OBJ contains no labels.

MAYBE-MAKE-MISCLASSIFICATION-MEASURER (OBJ)

Return a function of one parameter that is invoked when OBJ has the predicted label(s) computed and it counts misclassifications. Return NIL if OBJ contains no labels.

N-INPUTS-UNTIL-UPDATE (TRAINER)

Return the largest number of inputs guaranteed not to cause a change in the learner being trained.

SAMPLE (SAMPLER)

The SAMPLER - if not FINISHEDP - returns on object that represents a sample from the world to be experienced or in other words simply something the can be used as input for the learning.

SEGMENT-WEIGHTS (SEGMENT)

Return the weight array and start, end indices of SEGMENT.

SET-INPUT (SAMPLES LEARNER)

Set SAMPLES as inputs in LEARNER. SAMPLES is always a sequence of examples even for learners not capable of batch operation.

SET-MAX-N-STRIPES (MAX-N-STRIPES OBJECT)

Allocate the necessary stuff to allow for N-STRIPES number of stripes to be worked with simultaneously in OBJECT.

SET-N-STRIPES (N-STRIPES OBJECT)

Set the number of stripes (out of MAX-N-STRIPES) that are in use in OBJECT.

STRIPE-END (STRIPE OBJECT)

Return the end of STRIPE in OBJECT, that's usually an index into some kind of storage that backs OBJECT.

STRIPE-LABEL (STRIPED STRIPE)

Return the label of STRIPE in STRIPED. Typically computed by finding the label with the maximum probability.

STRIPE-START (STRIPE OBJECT)

Return the start of STRIPE in OBJECT, that's usually an index into some kind of storage that backs OBJECT.

TRAIN (SAMPLER TRAINER LEARNER)

Train LEARNER with TRAINER on the examples from SAMPLER. Before that TRAINER is initialized for LEARNER with INITIALIZE-TRAINER. Training continues until SAMPLER is finished.

TRAIN-BATCH (BATCH TRAINER LEARNER)

Called by TRAIN. Useful to hang an around method on to monitor progress.

Undocumented

CLASSIFICATION-CONFIDENCES (STRIPED STRIPE)

GET-ERROR (COUNTER)

RESET-COUNTER (COUNTER)

SAMPLE-TO-EXECUTOR-CACHE-KEY (SAMPLE OBJECT)

Private

MAYBE-MAKE-CLASSIFICATION-CONFIDENCE-COLLECTOR (OBJ)

Return a collector function of (examples learner) args that is invoked when OBJ has the predicted label(s) computed and it collects a sequence of confidences (one confidence list per example). Return NIL if OBJ contains no labels.

SLOT-ACCESSOR

Public

BATCH-SIZE (OBJECT)

After having gone through BATCH-SIZE number of inputs weights are updated.

SETFBATCH-SIZE (NEW-VALUE OBJECT)

After having gone through BATCH-SIZE number of inputs weights are updated.

COST (OBJECT)

Return the sum of costs for all active stripes. The cost of a stripe is the sum of the error nodes. The second value is the number of stripes.

DEFAULT-VALUE (OBJECT)

Upon creation or resize the lump's nodes get filled with this value.

MAX-N-STRIPES (OBJECT)

The The number of stripes with which the OBJECT is capable of dealing simultaneously.

N-STRIPES (OBJECT)

The The number of stripes currently present in OBJECT. This is at most MAX-N-STRIPES.

N-SUM-ERRORS (OBJECT)

The total number of observations whose errors contributed to SUM-ERROR.

NODES (OBJECT)

The values of the nodes. All nodes have values. It is conceptually a N-STRIPES x SIZE matrix that can be enlarged to MAX-N-STRIPES x SIZE by setting N-STRIPES.

SEGMENT-SET (OBJECT)

Segments to train.

SEGMENTS (OBJECT)

The A list of segments associated with OBJECT. Trainers must implement this. It is also defined on SEGMENT-SETs.

SUM-ERRORS (OBJECT)

The sum of errors.

TARGET (OBJECT)

A lump of the same size as INPUT-LUMP that is the T in -sum_{k}target_k*ln(x_k) which the the cross entropy error.

Undocumented

GROUP-SIZE (OBJECT)

MAX-N-SAMPLES (OBJECT)

SETFMAX-N-SAMPLES (NEW-VALUE OBJECT)

N-INPUTS (OBJECT)

SETFN-INPUTS (NEW-VALUE OBJECT)

N-SAMPLES (OBJECT)

SETFN-SAMPLES (NEW-VALUE OBJECT)

NAME (OBJECT)

SAMPLER (OBJECT)

SETFSAMPLER (NEW-VALUE OBJECT)

SEGMENT-SET-SIZE (OBJECT)

SIZE (OBJECT)

START-INDICES (OBJECT)

Private

PER-LABEL-COUNTERS (OBJECT)

A hash table mapping labels to the cross entropy counters for samples with that label.

Undocumented

CLASS-INDEX (OBJECT)

CLASS-LABEL (OBJECT)

CONFIDENCES-FN (OBJECT)

ELEMENTS (OBJECT)

SETFELEMENTS (NEW-VALUE OBJECT)

EXAMPLE-FN (OBJECT)

EXECUTOR-CACHE (OBJECT)

CLASS

Public

COUNTING-SAMPLER

Keep track of how many samples have been generated and say FINISHEDP if it's not less than MAX-N-INPUTS (that is optional).

LABELED

Mixin for chunks/whatever that hold labels. In the simplest case you need to make sure that LABEL and STRIPE-LABEL work on examples and striped things of interest. For instance in a BM, SOFTMAX-LABEL-CHUNK inherits from LABELED and SOFTMAX-CHUNK. STRIPE-LABEL is implemented as taking the index of the prediction with the maximum probability. Thus, only LABEL is left to be implemented on the training examples. Once set up, COUNT-MISCLASSIFICATIONS can be called directly or one can work with counters and measurers.

SEGMENT-SET (OBJECT)

It's like a concatenation of segments.

TRIVIAL-CACHED-EXECUTOR-MIXIN

Undocumented

COUNTER

COUNTING-FUNCTION-SAMPLER

CROSS-ENTROPY-COUNTER

ERROR-COUNTER

FUNCTION-SAMPLER

MISCLASSIFICATION-COUNTER

RMSE-COUNTER

ROC-AUC-COUNTER

Private

Undocumented

COLLECTING-COUNTER