# FUNCTION

# Public

# CG (FN W &KEY (MAX-N-LINE-SEARCHES *DEFAULT-MAX-N-LINE-SEARCHES*) (MAX-N-EVALUATIONS-PER-LINE-SEARCH *DEFAULT-MAX-N-EVALUATIONS-PER-LINE-SEARCH*) (MAX-N-EVALUATIONS *DEFAULT-MAX-N-EVALUATIONS*) (SIG *DEFAULT-SIG*) (RHO *DEFAULT-RHO*) (INT *DEFAULT-INT*) (EXT *DEFAULT-EXT*) (RATIO *DEFAULT-RATIO*) SPARE-VECTORS)

Minimize a differentiable multivariate function with conjugate
gradient. The Polack-Ribiere flavour of conjugate gradients is used to
compute search directions, and a line search using quadratic and cubic
polynomial approximations and the Wolfe-Powell stopping criteria is
used together with the slope ratio method for guessing initial step
sizes. Additionally a bunch of checks are made to make sure that
exploration is taking place and that extrapolation will not be
unboundedly large.
FN is a function of two parameters: WEIGHTS and DERIVATIVES.
WEIGHTS is an FLT-VECTOR of the same size as W that is where the
search start from. DERIVATIVES is also and FLT-VECTOR of that size and
it is where FN shall place the partial derivatives. FN returns the
value (of type FLT) of the function that is being minimized.
CG performs a number of line searches and invokes FN at each step. A
line search invokes FN at most MAX-N-EVALUATIONS-PER-LINE-SEARCH
number of times and can succeed in improving the minimum by the
sufficient margin or it can fail. Note, the even a failed line search
may improve further and hence change the weights it's just that the
improvement was deemed too small. CG stops when either:
- two line searches fail in a row
- MAX-N-LINE-SEARCHES is reached
- MAX-N-EVALUATIONS is reached
CG returns an FLT-VECTOR that contains the best weights, the minimum
(of type FLT), the number of line searches performed, the number of
succesful line searches and the number of evaluations.
When using MAX-N-EVALUATIONS remember that there is an extra
evaluation of FN before the first line search.
SPARE-VECTORS is a list of preallocated FLT-VECTORs of the same size
as W. Passing 6 of them covers the current need of the algorithm and
it will not cons up vectors of size W at all.
NOTE: If the function terminates within a few iterations, it could be
an indication that the function values and derivatives are not
consistent (ie, there may be a bug in the implementation of FN
function).
SIG and RHO are the constants controlling the Wolfe-Powell conditions.
SIG is the maximum allowed absolute ratio between previous and new
slopes (derivatives in the search direction), thus setting SIG to low
(positive) values forces higher precision in the line-searches. RHO is
the minimum allowed fraction of the expected (from the slope at the
initial point in the linesearch). Constants must satisfy 0 < RHO < SIG
< 1. Tuning of SIG (depending on the nature of the function to be
optimized) may speed up the minimization; it is probably not worth
playing much with RHO.

# Private

# Undocumented

# CHECK-LIMIT (VALUE LIMIT)

# INNER* (V1 V2)

# NEGATE-VECTOR (V &KEY RESULT)

# POLACK-RIBIERE (OLD-DF NEW-DF)

# PROCESS-BATCH (TRAINER LEARNER BATCH WEIGHTS DERIVATIVES)

# UPDATE-DIRECTION (S OLD-DF NEW-DF)

# V1=V2+C*V3 (V1 V2 C V3)

# MACRO

# Private

# Undocumented

# WITH-NIL-ON-ARITHMETIC-ERROR (&BODY BODY)

# GENERIC-FUNCTION

# Public

# COMPUTE-BATCH-COST-AND-DERIVE (BATCH TRAINER LEARNER)

Return the total cost (that LEARNER is trying to
minimize) of all samples in BATCH and add the derivatives to
ACCUMULATOR1 of TRAINER.

# SLOT-ACCESSOR

# Public

# ACCUMULATOR (OBJECT)

This is where COMPUTE-BATCH-COST-AND-DERIVE should
leave the derivatives.

# SETFACCUMULATOR (NEW-VALUE OBJECT)

An FLT vector that is accessed directly by the
client and are used to store the sum of the computed gradient.

# SEGMENT-SET (OBJECT)

Segments to train.

# Undocumented

# CG-ARGS (OBJECT)

# SETFCG-ARGS (NEW-VALUE OBJECT)

# N-INPUTS (OBJECT)

# SETFN-INPUTS (NEW-VALUE OBJECT)

# Private

# SEGMENT-DECAY-FN (OBJECT)

If not NIL it's a designator for a function that
returns a decay of type FLT for a given segment. For convenience NIL
is also treated as 0 decay.

# SETFSEGMENT-DECAY-FN (NEW-VALUE OBJECT)

If not NIL it's a designator for a function that
returns a decay of type FLT for a given segment. For convenience NIL
is also treated as 0 decay.

# SEGMENT-FILTER (OBJECT)

A predicate function on segments that filters out
uninteresting segments. Called from INITIALIZE-TRAINER.

# SPARE-VECTORS (OBJECT)

Pre-allocated vectors to make CG less consy.

# SETFSPARE-VECTORS (NEW-VALUE OBJECT)

Pre-allocated vectors to make CG less consy.

# Undocumented

# WEIGHTS (OBJECT)

# SETFWEIGHTS (NEW-VALUE OBJECT)

# VARIABLE

# Public

# *DEFAULT-EXT*

Extrapolate maximum EXT times the current step-size.

# *DEFAULT-INT*

Don't reevaluate within INT of the limit of the current bracket.

# *DEFAULT-RATIO*

Maximum allowed slope ratio.

# *DEFAULT-RHO*

RHO is the minimum allowed fraction of the expected (from the slope
at the initial point in the linesearch). Constants must satisfy 0 <
RHO < SIG < 1.

# *DEFAULT-SIG*

SIG and RHO are the constants controlling the Wolfe-Powell
conditions. SIG is the maximum allowed absolute ratio between previous
and new slopes (derivatives in the search direction), thus setting SIG
to low (positive) values forces higher precision in the
line-searches.

# Undocumented

# *DEFAULT-MAX-N-EVALUATIONS*

# *DEFAULT-MAX-N-EVALUATIONS-PER-LINE-SEARCH*

# *DEFAULT-MAX-N-LINE-SEARCHES*

# CLASS

# Public

# CG-TRAINER

Updates all weights simultaneously after chewing
through BATCH-SIZE inputs.

# DECAYED-CG-TRAINER-MIXIN

Mix this before a CG based trainer to conveniently
add decay on a per-segment basis.

# SEGMENT-SET (OBJECT)

It's like a concatenation of segments.