Common Lisp Package: MGL-GD

Generic, gradient based optimization related interface and simple gradient descent based trainers.

README:

FUNCTION

Public

Undocumented

FIND-TRAINER-FOR-SEGMENT (SEGMENT TRAINER)

Private

Undocumented

SET-UP-N-WEIGHT-USES (TRAINER)

MACRO

Public

Undocumented

DO-SEGMENT-GRADIENT-ACCUMULATORS (((SEGMENT ACC-START ACCUMULATOR) TRAINER) &BODY BODY)

WITH-SEGMENT-GRADIENT-ACCUMULATOR (((START ACCUMULATOR) (SEGMENT TRAINER)) &BODY BODY)

GENERIC-FUNCTION

Public

FIND-SEGMENT-GRADIENT-ACCUMULATOR (SEGMENT TRAINER)

Return the start index and the accumulator belonging to SEGMENT in TRAINER or NIL if it is not found.

MAP-SEGMENT-GRADIENT-ACCUMULATORS (FN TRAINER)

Call FN of lambda list (SEGMENT ACC-START ACCUMULATOR) on each segment trained by TRAINER.

MAYBE-UPDATE-WEIGHTS (TRAINER N-NEW-INPUTS)

Update the weights being trained. N-NEW-INPUTS have been seen since the last time this was called.

UPDATE-WEIGHTS (TRAINER)

Called by MAYBE-UPDATE-WEIGHTS when all weights are to be updated at the same time.

SLOT-ACCESSOR

Public

ACCUMULATOR (OBJECT)

This is where COMPUTE-BATCH-COST-AND-DERIVE should leave the derivatives.

SETFACCUMULATOR (NEW-VALUE OBJECT)

An FLT vector that is accessed directly by the client and are used to store the sum of the computed gradient.

AFTER-UPDATE-HOOK (OBJECT)

A list of functions with no arguments called after weights are updated.

SETFAFTER-UPDATE-HOOK (NEW-VALUE OBJECT)

A list of functions with no arguments called after weights are updated.

BATCH-SIZE (OBJECT)

After having gone through BATCH-SIZE number of inputs weights are updated.

SETFBATCH-SIZE (NEW-VALUE OBJECT)

After having gone through BATCH-SIZE number of inputs weights are updated.

BEFORE-UPDATE-HOOK (OBJECT)

A list of functions of no parameters. Each function is called just before UPDATE-WEIGHTS takes place. Convenient to hang some additional gradient accumulating code on.

SETFBEFORE-UPDATE-HOOK (NEW-VALUE OBJECT)

A list of functions of no parameters. Each function is called just before UPDATE-WEIGHTS takes place. Convenient to hang some additional gradient accumulating code on.

LEARNING-RATE (OBJECT)

This is normally divided by the number of inputs in the batch or the number of uses the weight in question has seen.

SETFLEARNING-RATE (NEW-VALUE OBJECT)

This is normally divided by the number of inputs in the batch or the number of uses the weight in question has seen.

N-INPUTS-IN-BATCH (OBJECT)

In-batch counter of inputs.

SETFN-INPUTS-IN-BATCH (NEW-VALUE OBJECT)

In-batch counter of inputs.

N-WEIGHT-USES-IN-BATCH (OBJECT)

Number of uses of the weight in its current batch.

SETFN-WEIGHT-USES-IN-BATCH (NEW-VALUE OBJECT)

Number of uses of the weight in its current batch.

SEGMENTER (OBJECT)

When this trainer is initialized it loops over the segment of the learner with MAP-SEGMENTS. SEGMENTER is a function that is called with each segment and returns a trainer or NIL. Several segments may be mapped to the same trainer. After the segment->trainer mappings are collected, each trainer is initialized by INITIALIZE-TRAINER with the list segments mapped to it.

SETFSEGMENTER (NEW-VALUE OBJECT)

When this trainer is initialized it loops over the segment of the learner with MAP-SEGMENTS. SEGMENTER is a function that is called with each segment and returns a trainer or NIL. Several segments may be mapped to the same trainer. After the segment->trainer mappings are collected, each trainer is initialized by INITIALIZE-TRAINER with the list segments mapped to it.

WEIGHT-DECAY (OBJECT)

WEIGHT-DECAY * WEIGHT is added to the gradient to penalize large weights. It's as if the function whose minima is sought had sum_i{0.5 * WEIGHT-DECAY * WEIGHT_i^2} added to it.

SETFWEIGHT-DECAY (NEW-VALUE OBJECT)

WEIGHT-DECAY * WEIGHT is added to the gradient to penalize large weights. It's as if the function whose minima is sought had sum_i{0.5 * WEIGHT-DECAY * WEIGHT_i^2} added to it.

WEIGHT-PENALTY (OBJECT)

WEIGHT-PENALTY is added to the gradient pushing the weight towards negative infinity. It's as if the function whose minima is sought had sum_i{WEIGHT-PENALTY*WEIGHT_i} added to it. Putting it on feature biases consitutes a sparsity constraint on the features.

SETFWEIGHT-PENALTY (NEW-VALUE OBJECT)

WEIGHT-PENALTY is added to the gradient pushing the weight towards negative infinity. It's as if the function whose minima is sought had sum_i{WEIGHT-PENALTY*WEIGHT_i} added to it. Putting it on feature biases consitutes a sparsity constraint on the features.

Undocumented

MOMENTUM (OBJECT)

SETFMOMENTUM (NEW-VALUE OBJECT)

N-INPUTS (OBJECT)

SETFN-INPUTS (NEW-VALUE OBJECT)

TRAINERS (OBJECT)

Private

Undocumented

WEIGHT-DELTAS (OBJECT)

SETFWEIGHT-DELTAS (NEW-VALUE OBJECT)

CLASS

Public

BATCH-GD-TRAINER

Updates all weights simultaneously after chewing through BATCH-SIZE inputs. PER-WEIGHT-BATCH-GD-TRAINER may be a better choice when some weights can go unused for instance due to missing input values.

GD-TRAINER

This is the common base class of gradient descent based trainers with momentum and weight decay.

NORMALIZED-BATCH-GD-TRAINER

Like BATCH-GD-TRAINER but keeps count of how many times each weight was used in the batch and divides the accumulated gradient by this count instead of dividing by N-INPUTS-IN-BATCH. This only makes a difference if there are missing values in the learner that's being trained. The main feature that distuinguishes this class from PER-WEIGHT-BATCH-GD-TRAINER is that batches end at same time for all weights.

PER-WEIGHT-BATCH-GD-TRAINER

This is much like BATCH-GD-TRAINER but it is more clever about when to update weights. Basically every weight has its own batch independent from the batches of others. It has desirable properties. One can for example put two neural networks together without adding any connections between them and the learning will produce results equivalent to separated case. Also, adding inputs with only missing values does not change anything.

SEGMENTED-GD-TRAINER

A trainer that delegates training of segments to other trainers. Useful to delegate training of different segments to different trainers (capable of working with segmantables) or simply to not train all segments.