Common Lisp Package: MGL-CG

Conjugate gradient based trainer.

README:

FUNCTION

Public

CG (FN W &KEY (MAX-N-LINE-SEARCHES *DEFAULT-MAX-N-LINE-SEARCHES*) (MAX-N-EVALUATIONS-PER-LINE-SEARCH *DEFAULT-MAX-N-EVALUATIONS-PER-LINE-SEARCH*) (MAX-N-EVALUATIONS *DEFAULT-MAX-N-EVALUATIONS*) (SIG *DEFAULT-SIG*) (RHO *DEFAULT-RHO*) (INT *DEFAULT-INT*) (EXT *DEFAULT-EXT*) (RATIO *DEFAULT-RATIO*) SPARE-VECTORS)

Minimize a differentiable multivariate function with conjugate gradient. The Polack-Ribiere flavour of conjugate gradients is used to compute search directions, and a line search using quadratic and cubic polynomial approximations and the Wolfe-Powell stopping criteria is used together with the slope ratio method for guessing initial step sizes. Additionally a bunch of checks are made to make sure that exploration is taking place and that extrapolation will not be unboundedly large. FN is a function of two parameters: WEIGHTS and DERIVATIVES. WEIGHTS is an FLT-VECTOR of the same size as W that is where the search start from. DERIVATIVES is also and FLT-VECTOR of that size and it is where FN shall place the partial derivatives. FN returns the value (of type FLT) of the function that is being minimized. CG performs a number of line searches and invokes FN at each step. A line search invokes FN at most MAX-N-EVALUATIONS-PER-LINE-SEARCH number of times and can succeed in improving the minimum by the sufficient margin or it can fail. Note, the even a failed line search may improve further and hence change the weights it's just that the improvement was deemed too small. CG stops when either: - two line searches fail in a row - MAX-N-LINE-SEARCHES is reached - MAX-N-EVALUATIONS is reached CG returns an FLT-VECTOR that contains the best weights, the minimum (of type FLT), the number of line searches performed, the number of succesful line searches and the number of evaluations. When using MAX-N-EVALUATIONS remember that there is an extra evaluation of FN before the first line search. SPARE-VECTORS is a list of preallocated FLT-VECTORs of the same size as W. Passing 6 of them covers the current need of the algorithm and it will not cons up vectors of size W at all. NOTE: If the function terminates within a few iterations, it could be an indication that the function values and derivatives are not consistent (ie, there may be a bug in the implementation of FN function). SIG and RHO are the constants controlling the Wolfe-Powell conditions. SIG is the maximum allowed absolute ratio between previous and new slopes (derivatives in the search direction), thus setting SIG to low (positive) values forces higher precision in the line-searches. RHO is the minimum allowed fraction of the expected (from the slope at the initial point in the linesearch). Constants must satisfy 0 < RHO < SIG < 1. Tuning of SIG (depending on the nature of the function to be optimized) may speed up the minimization; it is probably not worth playing much with RHO.

Private

Undocumented

CHECK-LIMIT (VALUE LIMIT)

INNER* (V1 V2)

NEGATE-VECTOR (V &KEY RESULT)

POLACK-RIBIERE (OLD-DF NEW-DF)

PROCESS-BATCH (TRAINER LEARNER BATCH WEIGHTS DERIVATIVES)

UPDATE-DIRECTION (S OLD-DF NEW-DF)

V1=V2+C*V3 (V1 V2 C V3)

MACRO

Private

Undocumented

WITH-NIL-ON-ARITHMETIC-ERROR (&BODY BODY)

GENERIC-FUNCTION

Public

COMPUTE-BATCH-COST-AND-DERIVE (BATCH TRAINER LEARNER)

Return the total cost (that LEARNER is trying to minimize) of all samples in BATCH and add the derivatives to ACCUMULATOR1 of TRAINER.

SLOT-ACCESSOR

Public

ACCUMULATOR (OBJECT)

This is where COMPUTE-BATCH-COST-AND-DERIVE should leave the derivatives.

SETFACCUMULATOR (NEW-VALUE OBJECT)

An FLT vector that is accessed directly by the client and are used to store the sum of the computed gradient.

SEGMENT-SET (OBJECT)

Segments to train.

Undocumented

CG-ARGS (OBJECT)

SETFCG-ARGS (NEW-VALUE OBJECT)

N-INPUTS (OBJECT)

SETFN-INPUTS (NEW-VALUE OBJECT)

Private

SEGMENT-DECAY-FN (OBJECT)

If not NIL it's a designator for a function that returns a decay of type FLT for a given segment. For convenience NIL is also treated as 0 decay.

SETFSEGMENT-DECAY-FN (NEW-VALUE OBJECT)

If not NIL it's a designator for a function that returns a decay of type FLT for a given segment. For convenience NIL is also treated as 0 decay.

SEGMENT-FILTER (OBJECT)

A predicate function on segments that filters out uninteresting segments. Called from INITIALIZE-TRAINER.

SPARE-VECTORS (OBJECT)

Pre-allocated vectors to make CG less consy.

SETFSPARE-VECTORS (NEW-VALUE OBJECT)

Pre-allocated vectors to make CG less consy.

Undocumented

WEIGHTS (OBJECT)

SETFWEIGHTS (NEW-VALUE OBJECT)

VARIABLE

Public

*DEFAULT-EXT*

Extrapolate maximum EXT times the current step-size.

*DEFAULT-INT*

Don't reevaluate within INT of the limit of the current bracket.

*DEFAULT-RATIO*

Maximum allowed slope ratio.

*DEFAULT-RHO*

RHO is the minimum allowed fraction of the expected (from the slope at the initial point in the linesearch). Constants must satisfy 0 < RHO < SIG < 1.

*DEFAULT-SIG*

SIG and RHO are the constants controlling the Wolfe-Powell conditions. SIG is the maximum allowed absolute ratio between previous and new slopes (derivatives in the search direction), thus setting SIG to low (positive) values forces higher precision in the line-searches.

Undocumented

*DEFAULT-MAX-N-EVALUATIONS*

*DEFAULT-MAX-N-EVALUATIONS-PER-LINE-SEARCH*

*DEFAULT-MAX-N-LINE-SEARCHES*

CLASS

Public

CG-TRAINER

Updates all weights simultaneously after chewing through BATCH-SIZE inputs.

DECAYED-CG-TRAINER-MIXIN

Mix this before a CG based trainer to conveniently add decay on a per-segment basis.

SEGMENT-SET (OBJECT)

It's like a concatenation of segments.