# FUNCTION

# Public

# ANOVA-ONE-WAY-VARIABLES (&REST ARGS)

ANOVA-ONE-WAY-VARIABLES (IV DV &OPTIONAL (SCHEFFE-TESTS-P T)
CONFIDENCE-INTERVALS)
Performs a one-way analysis of variance (ANOVA) on the input data, which
should be two equal-length sequences: `iv' is the independent variable,
represented as a sequence of categories or group identifiers, and `dv' is the
dependent variable, represented as a sequence of numbers. The `iv' variable
must be ``sorted,'' meaning that AAABBCCCCCDDDD is okay but ABCDABCDABDCDC is
not, where A, B, C and D are group identifiers. Furthermore, each group should
consist of at least 2 elements.
The significance of the result indicates that the group means are not all equal;
that is, at least two of the groups have significantly different means. If
there were only two groups, this would be semantically equivalent to an
unmatched, two-tailed t-test, so you can think of the one-way ANOVA as a
multi-group, two-tailed t-test.
This function returns five values: 1. an ANOVA table; 2. a list a group
means; 3. either a Scheffe table or nil depending on `scheffe-tests-p'; and
4. an alternate value for SST. 5. a list of confidence intervals in the form
`(,mean ,lower ,upper) for each group, if `confidence-intervals' is a number
between zero and one, giving the kind of confidence interval, such as 0.9. The
fourth value is only interesting if you think there are numerical accuracy
problems; it should be approximately equal to the SST value in the ANOVA
table. This function differs from `anova-one-way-groups' only in its input
representation. See the manual for more information.

# ANOVA-TWO-WAY-VARIABLES (&REST ARGS)

ANOVA-TWO-WAY-VARIABLES (DV IV1 IV2)
Calculates the analysis of variance when there are two factors that may
affect the dependent variable, specifically `iv1' and `iv2.' Unlike the one-way
ANOVA, there are mathematical difficulties with the two-way ANOVA if there are
unequal cell sizes; therefore, we require all cells to be the same size; that
is, the same number of values (of the dependent variable) for each combination
of the independent factors.
The result of the analysis is an anova-table, as described in the manual. This
function differs from `anova-two-way-groups' only in its input representation.
See the manual for further discussion of analysis of variance.
The row effect is `iv1' and the column effect is `iv2.'

# ANOVA-TWO-WAY-VARIABLES-UNEQUAL-CELL-SIZES (&REST ARGS)

ANOVA-TWO-WAY-VARIABLES-UNEQUAL-CELL-SIZES (IV1 IV2 DV)
Calculates the analysis of variance when there are two factors that may
affect the dependent variable, specifically `iv1' and `iv2.'
Unlike the one-way ANOVA, there are mathematical difficulties with the two-way
ANOVA if there are unequal cell sizes. This function differs fron the standard
two-anova by (1) the use of cell means as single scores, (2) the division of
squared quantities by the number of cell means contributing to the quantity
that is squared and (3) the multiplication of a "sum of squares" by the harmonic
mean of the sample sizes.
The result of the analysis is an anova-table, as described in the manual.
See the manual for further discussion of analysis of
variance. The row effect is `iv1' and the
column effect is `iv2.'

# AUTOCORRELATION (&REST ARGS)

AUTOCORRELATION (SAMPLE MAX-LAG &OPTIONAL (MIN-LAG 0))
Autocorrelation is merely a cross-correlation between a sample and itself.
This function returns a list of correlations, where the i'th element is the
correlation of the sample with the sample starting at `i.'

# BETA (Z W)

Returns the value of the Beta function, defined in terms of the complete
gamma function, G, as: G(z)G(w)/G(z+w). The implementation follows Numerical
Recipes in C, section 6.1.

# BETA-INCOMPLETE (A B X)

This function is useful in defining the cumulative distributions for
Student's t and the F distribution.
All arguments must be floating-point numbers; `a' and `b' must be positive and
`x' must be between 0.0 and 1.0, inclusive.

# BINOMIAL-CDF (P N K)

Suppose an event occurs with probability `p' per trial. This function
computes the probability of `k' or more events occurring in `n' trials. Note
that this is the complement of the usual definition of cdf. This function
approximates the actual computation using the incomplete beta function, but is
preferable for large `n' (greater than a dozen or so) because it avoids
summing many tiny floating-point numbers.
The implementation follows Numerical Recipes in C, section 6.3.

# BINOMIAL-CDF-EXACT (P N K)

This is an exact but computationally intensive form of the preferred
function, `binomial-cdf.'

# BINOMIAL-COEFFICIENT (N K)

Returns the binomial coefficient, `n' choose `k,' as an integer. The result
may not be exactly correct, since the computation is done with logarithms. The
result is rounded to an integer. The implementation follows Numerical Recipes
in C, section 6.1

# BINOMIAL-COEFFICIENT-EXACT (N K)

This is an exact but computationally intensive form of the preferred
function, `binomial-coefficient.'

# BINOMIAL-PROBABILITY (P N K)

Returns the probability of `k' successes in `n' trials, where at each trial
the probability of success is `p.' This function uses floating-point
approximations, and so is computationally efficient but not necessarily exact.

# BINOMIAL-PROBABILITY-EXACT (P N K)

This is an exact but computationally intensive form of the preferred
function, `binomial-probability.'

# CHI-SQUARE-SIGNIFICANCE (X DOF)

Computes the complement of the cumulative distribution function for a
Chi-square random variable with `dof' degrees of freedom evaluated at `x.' The
result is the probability that the observed chi-square for a correct model
should be greater than `x.' The implementation follows Numerical Recipes in C,
section 6.2. Small values suggest that the null hypothesis should be rejected;
in other words, this computes the significance of `x.'

# COMBINATION-COUNT (N K)

Returns the number of combinations of n elements taken k at a time. Assumes valid
input.

# CONFIDENCE-INTERVAL (&REST ARGS)

CONFIDENCE-INTERVAL NIL
NIL

# CONFIDENCE-INTERVAL-PROPORTION (&REST ARGS)

CONFIDENCE-INTERVAL-PROPORTION (X N CONFIDENCE)
Suppose we have a sample of `n' things and `x' of them are ``successes.'' We
can estimate the population proportion of successes as x/n; call it `p-hat.'
This function computes the estimate and a confidence interval on it. This
function is not appropriate for small samples with p-hat far from 1/2: `x'
should be at least 5, and so should `n'-`x.' This function returns three values:
p-hat, and the lower and upper bounds of the confidence interval. `Confidence'
should be a number between 0 and 1, exclusive.

# CONFIDENCE-INTERVAL-T (&REST ARGS)

CONFIDENCE-INTERVAL-T (DATA CONFIDENCE)
Suppose you have a sample of 10 numbers and you want to compute a 90 percent
confidence interval on the population mean. This function is the one to use.
This function uses the t-distribution, and so it is appropriate for small sample
sizes. It can also be used for large sample sizes, but the function
`confidence-interval-z' may be computationally faster. It returns three values:
the mean and the lower and upper bound of the confidence interval. True, only
two numbers are necessary, but the confidence intervals of other statistics may
be asymmetrical and these values would be consistent with those confidence
intervals. `Sample' should be a sequence of numbers. `Confidence' should be a
number between 0 and 1, exclusive.

# CONFIDENCE-INTERVAL-T-SUMMARIES (MEAN DOF STANDARD-ERROR CONFIDENCE)

This function is just like `confidence-interval-t,' except that instead of
its arguments being the actual data, it takes the following summary statistics:
`mean,' which is the estimator of some t-distributed parameter; `dof,' which is
the number of degrees of freedom in estimating the mean; and the
`standard-error' of the estimator. In general, `mean' is a point estimator of
the mean of a t-distribution, which may be the slope parameter of a regression,
the difference between two means, or other practical t-distributions.
`Confidence' should be a number between 0 and 1, exclusive.

# CONFIDENCE-INTERVAL-Z (&REST ARGS)

CONFIDENCE-INTERVAL-Z (DATA CONFIDENCE)
Suppose you have a sample of 50 numbers and you want to compute a 90 percent
confidence interval on the population mean. This function is the one to use.
Note that it makes the assumption that the sampling distribution is normal, so
it's inappropriate for small sample sizes. Use confidence-interval-t instead.
It returns three values: the mean and the lower and upper bound of the
confidence interval. True, only two numbers are necessary, but the confidence
intervals of other statistics may be asymmetrical and these values would be
consistent with those confidence intervals. This function handles 90, 95 and 99
percent confidence intervals as special cases, so those will be quite fast.
`Sample' should be a sequence of numbers. `Confidence' should be a number
between 0 and 1, exclusive.

# CORRELATION (&REST ARGS)

CORRELATION (SAMPLE1 SAMPLE2 &KEY START1 END1 START2 END2)
Computes the correlation coefficient of two samples, which should be
equal-length sequences of numbers.

# CORRELATION-FROM-SUMMARIES (N X X2 Y Y2 XY)

Computes the correlation of two variables given summary statistics of the
variables. All of these arguments are summed over the variable: `x' is the sum
of the x's, `x2' is the sum of the squares of the x's, and `xy' is the sum of
the cross-products, which is also known as the inner product of the variables x
and y. Of course, `n' is the number of data values in each variable.

# CORRELATION-MATRIX (DV IVS)

Returns a matrix of all the correlations of all the variables. The dependent
variable is row and column zero.

# COVARIANCE (&REST ARGS)

COVARIANCE (SAMPLE1 SAMPLE2 &KEY START1 END1 START2 END2)
Computes the covariance of two samples, which should be equal-length
sequences of numbers. Covariance is the inner product of differences between
sample elements and their sample means. For more information, see the manual.

# CROSS-CORRELATION (&REST ARGS)

CROSS-CORRELATION (SEQUENCE1 SEQUENCE2 MAX-LAG &OPTIONAL (MIN-LAG 0))
Returns a list of the correlation coefficients for all lags from
`min-lag' to `max-lag,' inclusive, where the `i'th list element is the
correlation of the first (length-of-sequence1 - i) elements of
sequence1 with with the last i elements of sequence2. Both sequences
should be sequences of numbers and of equal length.

# D-TEST (&REST ARGS)

D-TEST (SAMPLE-1 SAMPLE-2 TAILS &KEY (TIMES 1000) (H0MEAN 0))
Two-sample test for difference in means. Competes with the unmatched,
two-sample t-test. Each sample should be a sequence of numbers. We calculate
the mean of `sample-1' minus the mean of `sample-2'; call that D. Under the null
hypothesis, D is zero. There are three possible alternative hypotheses: D is
positive, D is negative, and D is either, and they are selected by the `tails'
parameter, which must be :positive, :negative, or :both, respectively. We count
the number of chance occurrences of D in the desired rejection region, and
return the estimated probability.

# DATA-LENGTH (&REST ARGS)

DATA-LENGTH (DATA &KEY START END KEY)
Returns the number of data values in `data.' Essentially, this is the Common
Lisp `length' function, except it handles sequences where there is a `start' or
`end' parameter. The `key' parameter is ignored.

# DEGREES->RADIANS (DEGREES)

Convert degrees to radians.

# DIV2 (I &OPTIONAL (POWER 1))

Divide positive fixnum `i' by 2 or a power of 2, yielding an integer result.
For example, (div2 35 5) => 1.

# ERROR-FUNCTION (X)

Computes the error function, which is typically used to compute areas under
the Gaussian probability distribution. See the manual for more information.
Also see the function `gaussian-cdf.'
This implementation follows Numerical Recipes in C, section 6.2

# ERROR-FUNCTION-COMPLEMENT (X)

This function computes the complement of the error function, ``erfc(x),''
defined as 1-erf(x). See the documentation for `error-function' for a more
complete definition and description. Essentially, this function on z/sqrt2
returns the two-tailed significance of z in a standard Gaussian distribution.
This function implements the function that Numerical Recipes in C calls erfcc,
see section 6.3; that is, it's the one using the Chebyshev approximation, since
that is the one they call from their statistical functions. It is quick to
compute and has fractional error everywhere less than 1.2x10^\{-7\}.

# EXP2 (N)

2^n

# EXTRACT-UNIQUE-VALUES (SEQUENCE)

A faster version of `remove-duplicates'. Note you cannot specify a :TEST (it is always #'eq).

# F-MEASURE (PRECISION RECALL &OPTIONAL (BETA 0.5))

Returns the f-measure, the combination of precision and recall based on
parameter beta - default = .5 which => precision and recall are equally weighted.
beta = 1 => precision is maximized. beta = 0 => recall is maximized.
From a recent statistics book - All of Statistics - springer verlag
http://www2.springeronline.com/sgw/cda/frontpage/0,,4-10128-22-13887455-0,00.html

# F-SIGNIFICANCE (F-STATISTIC NUMERATOR-DOF DENOMINATOR-DOF &OPTIONAL ONE-TAILED-P)

This function occurs in the statistical test of whether two observed samples
have the same variance. A certain statistic, F, essentially the ratio of the
observed dispersion of the first sample to that of the second one, is
calculated. This function computes the tail areas of the null hypothesis: that
the variances of the numerator and denominator are equal. It can be used for
either a one-tailed or two-tailed test. The default is two-tailed, but
one-tailed can be computed by setting the optional argument `one-tailed-p' to
true.
For a two-tailed test, this function computes the probability that F would be as
different from 1.0 (larger or smaller) as it is, if the null hypothesis is
true.
For a one-tailed test, this function computes the probability that F would be as
LARGE as it is if the first sample's underlying distribution actually has
SMALLER variance that the second's, where `numerator-dof' and `denominator-dof'
is the number of degrees of freedom in the numerator sample and the denominator
sample. In other words, this computes the significance level at which the
hypothesis ``the numerator sample has smaller variance than the denominator
sample'' can be rejected.
A small numerical value implies a very significant rejection.
The `f-statistic' must be a non-negative floating-point number. The degrees of
freedom arguments must be positive integers. The `one-tailed-p' argument is
treated as a boolean.
This implementation follows Numerical Recipes in C, section 6.3 and the `ftest'
function in section 13.4. Some of the documentation is also drawn from the
section 6.3, since I couldn't improve on their explanation.

# FACTORIAL (N)

Returns the factorial of `n,' which should be a non-negative integer. The
result will returned as a floating-point number, single-float if possible,
otherwise double-float. If it is returned as a double-float, it won't
necessarily be integral, since the actual computation is
(exp (gamma-ln (1+ n)))
Implementation is loosely based on Numerical Recipes in C, section 6.1. On the
TI Explorer, the largest argument that won't cause a floating overflow is 170.

# FACTORIAL-EXACT (N)

Returns the factorial of `n,' which should be an integer. The result will
returned as an integer or bignum. This implementation is exact, but is more
computationally expensive than `factorial,' which is to be preferred.

# FACTORIAL-LN (N)

Returns the natural logarithm of n!; `n' should be an integer. The result
will be a single-precision, floating point number. The implementation follows
Numerical Recipes in C, section 6.1

# GAMMA-INCOMPLETE (A X)

This is an incomplete gamma function, what Numerical Recipes in C calls
``gammp.'' This function also returns, as the second value, g(a,x). See the
manual for more information.

# GAMMA-LN (X)

Returns the natural logarithm of the Gamma function evaluated at `x.'
Mathematically, the Gamma function is defined to be the integral from 0 to
Infinity of t^x exp(-t) dt. The implementation is copied, with extensions for
the reflection formula, from Numerical Recipes in C, section 6.1. The argument
`x' must be positive. Full accuracy is obtained for x>1. For x<1, the
reflection formula is used. The computation is done using double-floats, and
the result is a double-float.

# GAUSSIAN-CDF (X &OPTIONAL (MEAN 0.0) (SD 1.0))

Computes the cumulative distribution function for a Gaussian random variable
(defaults: mean=0.0, s.d.=1.0) evaluated at `x.' The result is the probability
of getting a random number less than or equal to `x,' from the given Gaussian
distribution.

# GAUSSIAN-SIGNIFICANCE (X TAILS &OPTIONAL MEAN SD)

Computes the significance of `x' in a Gaussian distribution with mean=`mean'
(default 0.0) and standard deviation=`sd' (default 1.0); that is, it returns
the area which farther from the mean than `x' is.
The null hypothesis is roughly that `x' is zero; you must specify your
alternative hypothesis (H1) via the `tails' parameter, which must be :both,
:positive or :negative. The first corresponds to a two-tailed test: H1 is that
`x' is not zero, but you are not specifying a direction. If the parameter is
:positive, H1 is that `x' is positive, and similarly for :negative.

# INTERQUARTILE-RANGE (&REST ARGS)

INTERQUARTILE-RANGE (DATA)
The interquartile range is similar to the variance of a sample because both
are statistics that measure out ``spread out'' a sample is. The interquartile
range is the difference between the 3/4 quantile (the upper quartile) and the
1/4 quantile (the lower quartile).

# LAGGED-CORRELATION (SEQUENCE1 SEQUENCE2 LAG)

Returns the correlations of `sequence1' with `sequence2' after
shifting `sequence1' by `lag'. This means that for all n, element n
of `sequence1' is paired with element n+`lag' of `sequence2', where
both of those elements exist.

# LINEAR-REGRESSION-BRIEF (DV IV)

Calculates the main statistics of a linear regression: the slope and
intercept of the line, the coefficient of determination, also known as r-square,
the standard error of the slope, and the p-value for the regression. This
function takes two equal-length sequences of raw data. Note that the dependent
variable, as always, comes first in the argument list.
You should first look at your data with a scatter plot to see if a linear model
is plausible. See the manual for a fuller explanation of linear regression
statistics.

# LINEAR-REGRESSION-BRIEF-SUMMARIES (N X Y X2 Y2 XY)

Calculates the main statistics of a linear regression: the slope and
intercept of the line, the coefficient of determination, also known as r-square,
the standard error of the slope, and the p-value for the regression. This
function differs from `linear-regression-brief' in that it takes summary
variables: `x' and `y' are the sums of the independent variable and dependent
variables, respectively; `x2' and `y2' are the sums of the squares of the
independent variable and dependent variables, respectively; and `xy' is the sum
of the products of the independent and dependent variables.
You should first look at your data with a scatter plot to see if a linear model
is plausible. See the manual for a fuller explanation of linear regression
statistics.

# LINEAR-REGRESSION-MINIMAL (DV IV)

Calculates the slope and intercept of the regression line. This function
takes two equal-length sequences of raw data. Note that the dependent variable,
as always, comes first in the argument list.
You should first look at your data with a scatter plot to see if a linear model
is plausible. See the manual for a fuller explanation of linear regression
statistics.

# LINEAR-REGRESSION-MINIMAL-SUMMARIES (N X Y X2 Y2 XY)

Calculates the slope and intercept of the regression line. This function
differs from `linear-regression-minimal' in that it takes summary statistics:
`x' and `y' are the sums of the independent variable and dependent variables,
respectively; `x2' and `y2' are the sums of the squares of the independent
variable and dependent variables, respectively; and `xy' is the sum of the
products of the independent and dependent variables.
You should first look at your data with a scatter plot to see if a linear model
is plausible. See the manual for a fuller explanation of linear regression
statistics.

# LINEAR-REGRESSION-VERBOSE (DV IV)

Calculates almost every statistic of a linear regression: the slope and
intercept of the line, the standard error on each, the correlation coefficient,
the coefficient of determination, also known as r-square, and an ANOVA table as
described in the manual.
This function takes two equal-length sequences of raw data. Note that the
dependent variable, as always, comes first in the argument list. If you don't
need all this information, consider using the ``-brief,'' or ``-minimal''
functions, which do less computation.
You should first look at your data with a scatter plot to see if a linear model
is plausible. See the manual for a fuller explanation of linear regression
statistics.

# LINEAR-REGRESSION-VERBOSE-SUMMARIES (N X Y X2 Y2 XY)

Calculates almost every statistic of a linear regression: the slope and
intercept of the line, the standard error on each, the correlation coefficient,
the coefficient of determination, also known as r-square, and an ANOVA table as
described in the manual.
If you don't need all this information, consider using the ``-brief'' or
``-minimal'' functions, which do less computation.
This function differs from `linear-regression-verbose' in that it takes summary
variables: `x' and `y' are the sums of the independent variable and dependent
variables, respectively; `x2' and `y2' are the sums of the squares of the
independent variable and dependent variables, respectively; and `xy' is the sum
of the products of the independent and dependent variables.
You should first look at your data with a scatter plot to see if a linear model
is plausible. See the manual for a fuller explanation of linear regression
statistics.

# LINEAR-SCALE (VALUE OLD-MIN OLD-MAX NEW-MIN NEW-MAX)

Rescales value linearly from the old-min/old-max scale to the new-min/new-max one.

# LOG2 (N)

Log of `n' to base 2.

# MATRIX-MULTIPLY (&REST ARGS)

Does successive multiplications of each element in `args'. If two
elements are scalar, then their product is i * j, if a scalar is
multiplied by a matrix, then each element in the matrix is multiplied
by the scalar, lastly, if two matrices are multiplied then standard
matrix multiplication is applied, and the ranks must be such that if
ARGi is rank a x b and ARGj is rank c x d, then b must be equal to c.

# MAXIMUM (&REST ARGS)

MAXIMUM (DATA &KEY START END KEY)
Returns the element of the sequence `data' whose `key' is maximum. Signals
`no-data' if there is no data. If there is only one element in the data
sequence, that element will be returned, regardless of whether it is valid (a
number).

# MEAN (&REST ARGS)

MEAN (DATA &KEY START END KEY)
Returns the arithmetic mean of `data,' which should be a sequence.
Signals `no-data' if there is no data.

# MEDIAN (&REST ARGS)

MEDIAN (DATA &KEY START END KEY)
Returns the median of the subsequence of `data' from `start' to `end', using
`key'. The median is just the 0.5 quantile, and so this function returns the
same values as the `quantile' function.

# MINIMUM (&REST ARGS)

MINIMUM (DATA &KEY START END KEY)
Returns the element of the sequence `data' whose `key' is minimum. Signals
`no-data' if there is no data. If there is only one element in the data
sequence, that element will be returned, regardless of whether it is valid (a
number).

# MOD2 (N POWER)

Find `n' mod a power of 2.

# MODE (&REST ARGS)

MODE (DATA &KEY START END KEY)
Returns the most frequent element of `data,' which should be a sequence. The
algorithm involves sorting, and so the data must be numbers or the `key'
function must produce numbers. Consider `sxhash' if no better function is
available. Also returns the number of occurrences of the mode. If there is
more than one mode, this returns the first mode, as determined by the sorting of
the numbers.

# MULTIPLE-LINEAR-REGRESSION-ARRAYS (DV &REST IVS)

This is an internal function for the use of the multiple-linear-regression
functions. It takes the lists of values given by CLASP and puts them into a
pair of arrays, A and b, suitable for solving the matrix equation Ax=b, to find
the regression equation. The values are A and b. The first column of A is the
constant 1, so that an intercept will be included in the regression model.

# MULTIPLE-LINEAR-REGRESSION-BRIEF (DV &REST IVS)

Let m be the number of independent variables, `ivs.' This function returns a
vector of length m which are the coefficients of a linear equation that best
predicts the dependent variable, `dv,' in the least squares sense. It also
returns, as the second value, the sum of squared deviations of the data from the
fitted model, aka SSE, aka chi-square. The third value is the number of degrees
of freedom for the chi-square, if you want to test the fit.
This function returns an intermediate amount of information. Consider using the
sibling functions -minimal and -verbose if you want less or more information.

# MULTIPLE-LINEAR-REGRESSION-MINIMAL (DV &REST IVS)

Let m be the number of independent variables, `ivs.' This function returns
a vector of length m which are the coefficients of a linear equation that best
predicts the dependent variable, `dv,' in the least squares sense.
This function returns the minimal information for a least squares regression
model, namely a list of the coefficients of the ivs, with the constant term
first. Consider using the sibling functions -brief and -verbose if you want
more information.

# MULTIPLE-LINEAR-REGRESSION-NORMAL (DV &REST IVS)

Performs linear regression of the dependent variable, `dv,' on multiple
independent variables, `ivs.' Y on multiple X's, calculating the intercept and
regression coefficient. Calculates the F statistic, intercept and the
correlation coefficient for Y on X's.

# MULTIPLE-LINEAR-REGRESSION-VERBOSE (DV &REST IVS)

Let m be the number of independent variables, `ivs.' This function returns
fourteen values:
1. the intercept
2. a list of coefficients
3. a list of correlations of each iv to the dv and to each iv
4. a list of the t-statistic for each coefficient
5. a list of the standardized coefficients (betas)
6. the fraction of variance accounted for, aka r-square
7. the ratio of MSR (see #12) to MSE (see #13), aka F
8. a list of the portion of the SSR due to each iv
9. a list of the fraction of variance accounted for by each iv
10. the sum of squares of the regression, aka SSR
11. the sum of squares of the residuals, aka SSE, aka chi-square
12. the mean squared error of the regression, aka MSR
13. the mean squared error of the residuals, aka MSE
14. a list of indices of ``zeroed'' independent variables
This function returns a lot of information about the regression. Consider using
the sibling functions -minimal and -brief if you need less information.

# MULTIPLE-MODES (&REST ARGS)

MULTIPLE-MODES (DATA K &KEY START END KEY)
Returns the `k' most frequent elements of `data,' which should be a sequence.
The algorithm involves sorting, and so the data must be numbers or the `key'
function must produce numbers. Consider #'sxhash if no better function is
available. Also returns the number of occurrences of each mode. The value is
an association list of modes and their counts. This function is a little more
computationally expensive than `mode,' so only use it if you really need
multiple modes.

# NORMALIZE-MATRIX (M)

Returns a new matrix such that the sum of its elements is 1.0

# ON-INTERVAL (X LOWER-BOUND UPPER-BOUND &KEY (LOWER-INCLUSIVE? T) (UPPER-INCLUSIVE? T))

returns t iff x in the interval

# PERMUTATION-COUNT (N K)

Returns the number of possible ways of taking k elements out of n total.

# POISSON-CDF (K X)

Computes the cumulative distribution function for a Poisson random variable
with mean `x' evaluated at `k.' The result is the probability that the number of
Poisson random events occurring will be between 0 and k-1 inclusive, if the
expected number is `x.' The argument `k' should be an integer, while `x' should
be a float. The implementation follows Numerical Recipes in C, section 6.2

# QUANTILE (&REST ARGS)

QUANTILE (DATA Q &KEY START END KEY)
Returns the element which is the q'th percentile of the data when accessed by
`key.' That is, it returns the element such that `q' of the data is smaller than
it and 1-`q' is above it, where `q' is a number between zero and one, inclusive.
For example, if `q' is .5, this returns the median; if `q' is 0, this returns
the minimum (although the `minimum' function is more efficient).
This function uses the bisection method, doing linear interpolation between
elements i and i+1, where i=floor(q(n-1)). See the manual for more information.
The function returns three values: the interpolated quantile and the two
elements that determine the interval it was interpolated in. If the quantile
was exact, the second two values are the same element of the data.

# R-SCORE (NUMBER-LIST-1 NUMBER-LIST-2)

Takes two sequences and returns the correlation coefficient.
Formula: Sum (Cross-product (Difference-list (number-list-1)
Difference-list (number-list-2)) /
(Sqrt (Sum-of-Squares (number-list-1) *
Sum-of-Squares (number-list-2)))).

# RADIANS->DEGREES (RADIANS)

Convert radians to degrees. Does not round the result.

# RANGE (&REST ARGS)

RANGE (DATA &KEY START END KEY)
Returns the range of the sequence `data.' Signals `no-data' if there is no
data. The range is given by max - min.

# ROUND-TO-FACTOR (N FACTOR)

Equivalent to (* factor (round n factor)). For example, `round-to-factor' of
65 and 60 is 60. Useful for converting to certain units, say when converting
minutes to the nearest hours. See also `truncate-to-factor.'

# SAFE-EXP (X)

Eliminates floating point underflow for the exponential function.
Instead, it just returns 0.0d0

# SCHEFFE-TESTS (GROUP-MEANS GROUP-SIZES MS-ERROR DF-ERROR)

Performs all pairwise comparisons between group means, testing for
significance using Scheffe's F-test. Returns an upper-triangular table in a
format described in the manual. Also see the function `print-scheffe-table.'
`Group-means' and `group-sizes' should be sequences. The arguments `ms-error'
and `df-error' are the mean square error within groups and its degrees of
freedom, both of which are computed by the analysis of variance. An ANOVA test
should always be run first, to see if there are any significant differences.

# SIGNIFICANCE (&REST ARGS)

SIGNIFICANCE NIL
NIL

# SKEWNESS (&REST ARGS)

SKEWNESS (DATA &KEY START END KEY)
Returns the skewness of `data', which is the sum of cubed distances from the
mean divided by the standard deviation, divided by N.

# SMOOTH-4253H (DATA)

Smooths `data' by successive smoothing: 4,median; then 2,median; then
5,median; then 3,median; then hanning. The ends are handled by duplicating the
end elements. This function is not destructive; it returns a list the same
length as `data,' which should be a list of numbers.

# SMOOTH-HANNING (DATA)

Smooths `data' by replacing each element with the weighted mean of it and its
two neighbors. The weights are 1/2 for itself and 1/4 for each neighbor. The
ends are handled by duplicating the end elements. This function is not
destructive; it returns a list the same length as `data,' which should be a
sequence of numbers.

# SMOOTH-MEAN-2 (DATA)

With a window of size two, the median and mean smooth functions are the
same.

# SMOOTH-MEAN-3 (DATA)

Smooths `data' by replacing each element with the mean of it and its two
neighbors. The ends are handled by duplicating the end elements. This function
is not destructive; it returns a list the same length as `data,' which should be
a sequence of numbers.

# SMOOTH-MEAN-4 (DATA)

Smooths `data' by replacing each element with the mean of it, its left
neighbor, and its two right neighbors. The ends are handled by duplicating the
end elements. This function is not destructive; it returns a list the same
length as `data,' which should be a sequence of numbers.

# SMOOTH-MEAN-5 (DATA)

Smooths `data' by replacing each element with the median of it, its two left
neighbors and its two right neighbors. The ends are handled by duplicating the
end elements. This function is not destructive; it returns a list the same
length as `data,' which should be a sequence of numbers.

# SMOOTH-MEDIAN-2 (DATA)

Smooths `data' by replacing each element with the median of it and its
neighbor on the left. A median of two elements is the same as their mean. The
end is handled by duplicating the end element. This function is not
destructive; it returns a list the same length as `data,' which should be a
sequence of numbers.

# SMOOTH-MEDIAN-3 (DATA)

Smooths `data' by replacing each element with the median of it and its two
neighbors. The ends are handled by duplicating the end elements. This function
is not destructive; it returns a list the same length as `data,' which should be
a sequence of numbers.

# SMOOTH-MEDIAN-4 (DATA)

Smooths `data' by replacing each element with the median of it, its left
neighbor, and its two right neighbors. The ends are handled by duplicating the
end elements. This function is not destructive; it returns a list the same
length as `data,' which should be a sequence of numbers.

# SMOOTH-MEDIAN-5 (DATA)

Smooths `data' by replacing each element with the median of it, its two left
neighbors and its two right neighbors. The ends are handled by duplicating the
end elements. This function is not destructive; it returns a list the same
length as `data,' which should be a sequence of numbers.

# STANDARD-DEVIATION (&REST ARGS)

STANDARD-DEVIATION (DATA &KEY START END KEY)
Returns the standard deviation of `data,' which is just the square root of
the variance.
Signals `no-data' if there is no data. Signals `insufficient-data' if there is
only one datum.

# STATISTICAL-SUMMARY (&REST ARGS)

STATISTICAL-SUMMARY (DATA &KEY START END KEY)
Compute the length, minimum, maximum, range, median, mode, mean, variance,
standard deviation, and interquartile-range of `sequence' from `start' to `end',
accessed by `key'.

# STUDENTS-T-SIGNIFICANCE (T-STATISTIC DOF TAILS)

Student's distribution is much like the Gaussian distribution except with
heavier tails, depending on the number of degrees of freedom, `dof.' As `dof'
goes to infinity, Student's distribution approaches the Gaussian. This function
computes the significance of `t-statistic.' Values range from 0.0 to 1.0: small
values suggest that the null hypothesis---that `t-statistic' is drawn from a t
distribution---should be rejected. The `t-statistic' parameter should be a
float, while `dof' should be an integer.
The null hypothesis is roughly that `t-statistic' is zero; you must specify your
alternative hypothesis (H1) via the `tails' parameter, which must be :both,
:positive or :negative. The first corresponds to a two-tailed test: H1 is that
`t-statistic' is not zero, but you are not specifying a direction. If the
parameter is :positive, H1 is that `t-statistic' is positive, and similarly for
:negative.
This implementation follows Numerical Recipes in C, section 6.3.

# T-SIGNIFICANCE (&REST ARGS)

T-SIGNIFICANCE NIL
NIL

# T-TEST (&REST ARGS)

T-TEST (SAMPLE-1 SAMPLE-2 &OPTIONAL (TAILS BOTH) (H0MEAN 0))
Returns the t-statistic for the difference in the means of two samples, which
should each be a sequence of numbers. Let D=mean1-mean2. The null hypothesis
is that D=0. The alternative hypothesis is specified by `tails': `:both' means
D/=0, `:positive' means D>0, and `:negative' means D<0. Unless you're using
:both tails, be careful what order the two samples are in: it matters!
The function also returns the significance, the standard error, and the degrees
of freedom. Signals `standard-error-is-zero' if that condition occurs. Signals
`insufficient-data' unless there are at least two elements in each sample.

# T-TEST-MATCHED (&REST ARGS)

T-TEST-MATCHED (SAMPLE1 SAMPLE2 &OPTIONAL (TAILS BOTH))
Returns the t-statistic for two matched samples, which should be equal-length
sequences of numbers. Let D=mean1-mean2. The null hypothesis is that D=0. The
alternative hypothesis is specified by `tails': `:both' means D/=0, `:positive'
means D>0, and `:negative' means D<0. Unless you're using :both tails, be
careful what order the two samples are in: it matters!
The function also returns the significance, the standard error, and the degrees
of freedom. Signals `standard-error-is-zero' if that condition occurs. Signals
`insufficient-data' unless there are at least two elements in each sample.

# T-TEST-ONE-SAMPLE (&REST ARGS)

T-TEST-ONE-SAMPLE (DATA TAILS &OPTIONAL (H0-MEAN 0) &KEY START END KEY)
Returns the t-statistic for the mean of the data, which should be a sequence
of numbers. Let D be the sample mean. The null hypothesis is that D equals the
`H0-mean.' The alternative hypothesis is specified by `tails': `:both' means D
/= H0-mean, `:positive' means D > H0-mean, and `:negative' means D < H0-mean.
The function also returns the significance, the standard error, and the degrees
of freedom. Signals `zero-variance' if that condition occurs. Signals
`insufficient-data' unless there are at least two elements in the sample.

# TIMES2 (I &OPTIONAL (POWER 1))

Multiply `i' by a power of 2.

# TRIMMED-MEAN (&REST ARGS)

TRIMMED-MEAN (DATA PERCENTAGE &KEY START END KEY)
Returns a trimmed mean of `data.' A trimmed mean is an ordinary, arithmetic
mean of the data, except that an outlying percentage has been discarded. For
example, suppose there are ten elements in `data,' and `percentage' is 0.1: the
result would be the mean of the middle eight elements, having discarded the
biggest and smallest elements. If `percentage' doesn't result in a whole number
of elements being discarded, then a fraction of the remaining biggest and
smallest is discarded. For example, suppose `data' is '(1 2 3 4 5) and
`percentage' is 0.25: the result is (.75(2) + 3 + .75(4))/(.75+1+.75) or 3. By
convention, the 0.5 trimmed mean is the median, which is always returned as a
number.

# TRUNC2 (N POWER)

Truncate `n' to a power of 2.

# TRUNCATE-TO-FACTOR (N FACTOR)

Equivalent to (* factor (truncate n factor)). For example,
`truncate-to-factor' of 65 and 60 is 60. Useful for converting to certain
units, say when converting minutes to hours and minutes. See also
`round-to-factor.'

# TUKEY-SUMMARY (&REST ARGS)

TUKEY-SUMMARY (DATA &KEY START END KEY)
Computes a Tukey five-number summary of the data. That is, it returns, in
increasing order, the extremes and the quartiles: the minimum, the 1/4 quartile,
the median, the 3/4 quartile, and the maximum.

# VARIANCE (&REST ARGS)

VARIANCE (DATA &KEY START END KEY)
Returns the variance of `data,' that is, the `sum-of-squares' divided by
n-1. Signals `no-data' if there is no data. Signals `insufficient-data' if
there is only one datum.

# Z-TEST-ONE-SAMPLE (&REST ARGS)

Z-TEST-ONE-SAMPLE (DATA TAILS &OPTIONAL (H0-MEAN 0) (H0-STD-DEV 1) &KEY START
END KEY)
NIL

# Undocumented

# ENSURE-FLOAT (NUMBER)

# MATRIX-TRACE (MATRIX)

# PARTIALS-FROM-PARENTS (FROM TO PARENTS-LIST)

# SQUARE (X)

# SUM-OF-ARRAY-ELEMENTS (ARRAY)

# TRANSPOSE-MATRIX (MATRIX &OPTIONAL INTO-MATRIX &AUX DIM-1 DIM-2)

# Private

# ANOVA-ONE-WAY-GROUPS (DATA &OPTIONAL (SCHEFFE-TESTS-P T) CONFIDENCE-INTERVALS)

Performs a one-way analysis of variance (ANOVA) on the `data,' which should
be a sequence of sequences, where each interior sequence is the data for a
particular group. Furthermore, each sequence should consist entirely of
numbers, and each should have at least 2 elements.
The significance of the result indicates that the group means are not all equal;
that is, at least two of the groups have significantly different means. If
there were only two groups, this would be semantically equivalent to an
unmatched, two-tailed t-test, so you can think of the one-way ANOVA as a
multi-group, two-tailed t-test.
This function returns five values: 1. an ANOVA table; 2. a list a group means;
3. either a Scheffe table or nil depending on `scheffe-tests-p'; 4. an
alternate value for SST; and 5. a list of confidence intervals in the form
`(,mean ,lower ,upper) for each group, if `confidence-intervals' is a number between
zero and one, giving the kind of confidence interval, such as 0.9. The fourth
value is only interesting if you think there are numerical accuracy problems; it
should be approximately equal to the SST value in the ANOVA table. This
function differs from `anova-one-way-variables' only in its input
representation. See the manual for more information.

# ANOVA-ONE-WAY-VARIABLES-INTERNAL (IV DV &OPTIONAL (SCHEFFE-TESTS-P T) CONFIDENCE-INTERVALS)

See ANOVA-ONE-WAY-VARIABLES

# ANOVA-TWO-WAY-GROUPS (DATA-ARRAY)

Calculates the analysis of variance when there are two factors that may
affect the dependent variable. Because the input is represented as an array, we
can refer to these two factors as the row-effect and the column effect. Unlike
the one-way ANOVA, there are mathematical difficulties with the two-way ANOVA if
there are unequal cell sizes; therefore, we require all cells to be the same
size, and so the input is a three-dimensional array.
The result of the analysis is an anova-table, as described in the manual. This
function differs from `anova-two-way-variables' only in its input
representation. See the manual for further discussion of analysis of variance.

# ANOVA-TWO-WAY-VARIABLES-INTERNAL (DV IV1 IV2)

See ANOVA-TWO-WAY-VARIABLES

# ANOVA-TWO-WAY-VARIABLES-UNEQUAL-CELL-SIZES-INTERNAL (IV1 IV2 DV)

See ANOVA-TWO-WAY-VARIABLES-UNEQUAL-CELL-SIZES

# AUTOCORRELATION-INTERNAL (SAMPLE MAX-LAG &OPTIONAL (MIN-LAG 0))

See AUTOCORRELATION

# CHI-SQUARE-2X2 (V1 V2)

Performs a chi-square test for independence of the two variables, `v1' and
`v2.' These should be categorial variables with only two values; the function
will construct a 2x2 contingency table by counting the number of occurrences of
each combination of the variables. See the manual for more details.

# CHI-SQUARE-2X2-COUNTS (A B C D &OPTIONAL (YATES T))

Runs a chi-square test for association on a simple 2 x 2 table. If `yates'
is nil, the correction for continuity is not done; default is t.
Returns the chi-square statistic and the significance of the value.

# CHI-SQUARE-RXC (V1 V2)

Performs a chi-square test for independence of the two variables, `v1' and
`v2.' These should be categorial variables; the function will construct a
contingency table by counting the number of occurrences of each combination of
the variables. See the manual for more details.

# CHI-SQUARE-RXC-COUNTS (CONTINGENCY-TABLE)

Calculates the chi-square statistic and corresponding p-value for the given
contingency table. The result says whether the row factor is independent of the
column factor. Does not apply Yate's correction.

# CONFIDENCE-INTERVAL-PROPORTION-INTERNAL (X N CONFIDENCE)

See CONFIDENCE-INTERVAL-PROPORTION

# CONFIDENCE-INTERVAL-T-INTERNAL (DATA CONFIDENCE)

See CONFIDENCE-INTERVAL-T

# CONFIDENCE-INTERVAL-Z-INTERNAL (DATA CONFIDENCE)

See CONFIDENCE-INTERVAL-Z

# CONFIDENCE-INTERVAL-Z-SUMMARIES (MEAN STANDARD-ERROR CONFIDENCE)

This function is just like `confidence-interval-z,' except that instead of
its arguments being the actual data, it takes the following summary statistics:
`mean', a point estimator of the mean of some normally distributed population;
and the `standard-error' of the estimator, that is, the estimated standard
deviation of the normal population. `Confidence' should be a number between 0
and 1, exclusive.

# CORRELATION-INTERNAL (SAMPLE1 SAMPLE2 &REST ARGS &KEY START1 END1 START2 END2)

See CORRELATION

# COVARIANCE-INTERNAL (SAMPLE1 SAMPLE2 &REST ARGS &KEY START1 END1 START2 END2)

See COVARIANCE

# CROSS-CORRELATION-INTERNAL (SEQUENCE1 SEQUENCE2 MAX-LAG &OPTIONAL (MIN-LAG 0))

See CROSS-CORRELATION

# D-TEST-INTERNAL (SAMPLE-1 SAMPLE-2 TAILS &KEY (TIMES 1000) (H0MEAN 0))

See D-TEST

# DATA-LENGTH-INTERNAL (DATA &KEY START END KEY)

See DATA-LENGTH

# DIFFERENCE-LIST (NUMBER-LIST)

Takes a sequence of numbers and returns a sequence of differences
from the mean.
Formula: xi = Xi - Mean (X).

# FIND-CRITICAL-VALUE (P-FUNCTION P-VALUE &OPTIONAL (X-TOLERANCE 1.e-5) (Y-TOLERANCE 1.e-5))

Returns the critical value of some statistic. The function `p-function'
should be a unary function mapping statistics---x values---to their
significance---p values. The function will find the value of x such that the
p-value is `p-value.' The function works by binary search. A secant method
might be better, but this seems to be acceptably fast. Only positive values of
x are considered, and `p-function' should be monotonically decreasing from its
value at x=0. The binary search ends when either the function value is within
`y-tolerance' of `p-value' or the size of the search region shrinks to less than
`x-tolerance.'

# G-TEST (CONTINGENCY-TABLE &OPTIONAL EXPECTED-VALUE-MATRIX (ERROR-P T))

Calculates the G-test for a contingency table. The formula for the
G-test statistic is
2 * sum[f_ij log [f_ij/f-hat_ij]]
where f_ij is the ith by jth cell in the table and f-hat_ij is the
expected value of that cell. If an expected-value-matrix is supplied,
it must be the same size as table and it is used for expected values,
in which case the G-test is a test of goodness-of-fit. If the
expected value matrix is unsupplied, it is calculated from using the
formula
e_ij = [f_i* * f_*j] / f_**
where f_i*, f_*j and f_** are the row, column and grand totals
respectively. In this case, the G-test is a test of independence. The degrees of freedom is the same as for the chi-square statistic and the significance is obtained by comparing

# INNER-PRODUCT (SAMPLE1 SAMPLE2 &KEY START1 END1 START2 END2)

Returns the inner product of the two samples, which should be sequences of
numbers. The inner product, also called the dot product or vector product, is
the sum of the pairwise multiplication of the numbers. Stops when either sample
runs out; it doesn't check that they have the same length.

# INTERQUARTILE-RANGE-INTERNAL (DATA &REST STANDARD-ARGS)

See INTERQUARTILE-RANGE

# INVERT-MATRIX (MATRIX &OPTIONAL INTO-MATRIX)

If matrix is singular returns nil, else returns its inverse.
If into-matrix is supplied, inverse is returned in it,
otherwise a new array is created.

# INVERT-MATRIX-ITERATE (MATRIX &OPTIONAL INTO-MATRIX)

If matrix is singular returns nil, else returns the inverse of matrix.
Uses iterative improvement until no further improvement is possible.

# MAKE-3D-TABLE (DV IV1 IV2)

Collects the `dv' values for each unique combination of an element of `v1'
and an element of `v2.' Returns a three-dimensional table of dv values.

# MAKE-CONTINGENCY-TABLE (V1 V2)

Counts each unique combination of an element of `v1' and an element of `v2.'
Returns a two-dimensional table of integers.

# MATRIX-ADDITION (&REST ARGS)

# MATRIX-NORM (MATRIX)

Returns the norm of matrix.
The norm is the maximum over the rows of the sum of the abs of the columns.

# MATRIX-PLUS-MATRIX (MAT1 MAT2)

Adds two matrices together

# MATRIX-PLUS-SCALAR (MATRIX SCALAR)

Add a scalar value to a matrix

# MATRIX-TIMES-MATRIX (MAT1 MAT2)

Multiplies two matrices together

# MATRIX-TIMES-SCALAR (MATRIX SCALAR)

Multiply a matrix by a scalar value

# MATRIX-TIMES-SCALAR! (MATRIX SCALAR)

Multiply a matrix by a scalar value

# MAXIMUM-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See MAXIMUM

# MEAN-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See MEAN

# MEDIAN-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See MEDIAN

# MINIMUM-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See MINIMUM

# MODE-FOR-CONTINUOUS-DATA (DATA &REST STANDARD-ARGS &KEY START END KEY WINDOW)

Returns the most frequent element of `data,' which should be a sequence. The
algorithm involves sorting, and so the data must be numbers or the `key'
function must produce numbers. Consider `sxhash' if no better function is
available. Also returns the number of occurrences of the mode. If there is
more than one mode, this returns the first mode, as determined by the sorting of
the numbers.
Keep in mind that if the data has multiple runs of like values that are bigger
than the window size (currently defaults to 10% of the size of the data) this
function will blindly pick the first one. If this is the case you probabaly
should be calling `mode' instead of this function.

# MODE-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See MODE

# MULTIPLE-MODES-INTERNAL (DATA K &REST STANDARD-ARGS &KEY START END KEY)

See MULTIPLE-MODES

# MULTIPLY-MATRICES (MATRIX-1 MATRIX-2 &OPTIONAL MATRIX-3 &AUX SAVED-MATRIX-3)

Multiply matrices MATRIX-1 and MATRIX-2, storing into MATRIX-3 if supplied.
If MATRIX-3 is not supplied, then a new (ART-Q type) array is returned, else
MATRIX-3 must have exactly the right dimensions for holding the result of the multiplication.
Both MATRIX-1 and MATRIX-2 must be either one- or two-diimensional.
The first dimension of MATRIX-2 must equal the second dimension of MATRIX-1, unless MATRIX-1
is one-dimensional, when the first dimensions must match (thus allowing multiplications of the
form VECTOR x MATRIX)

# PRINT-ANOVA-TABLE (ANOVA-TABLE &OPTIONAL (STREAM *STANDARD-OUTPUT*))

Prints `anova-table' on `stream.'

# PRINT-SCHEFFE-TABLE (SCHEFFE-TABLE &OPTIONAL GROUP-MEANS (STREAM *STANDARD-OUTPUT*))

Prints `scheffe-table' on `stream.' If the original one-way anova data had N
groups, the Scheffe table prints as an n-1 x n-1 upper-triangular table. If
`group-means' is given, it should be a list of the group means, which will be
printed along with the table.

# PYTHAG-DF (A B)

Computes square root of a*a + b*b without destructive overflow or underflow.

# PYTHAG-SF (A B)

Computes square root of a*a + b*b without destructive overflow or underflow.

# QUANTILE-INTERNAL (DATA Q &REST STANDARD-ARGS &KEY START END KEY)

See QUANTILE

# RANGE-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See RANGE

# REDUCE-MATRIX (MAT)

Uses the Gauss-Jordan reduction method to reduce a matrix.

# REMOVE-&REST (LIST)

Removes the '&rest arg' part from a lambda-list (strictly for documentation purposes.

# SCALAR-MATRIX-MULTIPLY (SCALAR MATRIX)

Multiplies a matrix by a scalar value in the form M[i,j] = s*M[i,j].

# SINGULAR-VALUE-DECOMPOSITION (MATRIX)

Returns as three values the U W and V of singular value decomposition. If
you have already consed up these matrices, you should call `svdcmp-sf' or
`svdcmp-df' directly. The input matrix is preserved.

# SKEWNESS-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See SKEWNESS

# STANDARD-DEVIATION-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See STANDARD-DEVIATION

# STATISTICAL-SUMMARY-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See STATISTICAL-SUMMARY

# SUM-LIST (NUMBER-LIST)

Takes a sequence of numbers and returns their sum.
Formula: Sum(X).

# SUM-OF-SQUARES (DATA &REST STANDARD-ARGS &KEY START END KEY)

Returns the sum of squared distances from the mean of `data'.
Signals `no-data' if there is no data.

# SVBKSB-DF (U W V M N B X &OPTIONAL (TMP (MAKE-ARRAY N ELEMENT-TYPE 'DOUBLE-FLOAT)))

Solves A X = B for a vector `X,' where A is specified by the mxn array U, `n'
vector W, and nxn matrix V as returned by svdcmp. `m' and `n' are the
dimensions of `A,' and will be equal for square matrices. `B' is the 1xm input
vector for the right-hand side. `X' is the 1xn output solution vector. All
arrays are of double-floats. No input quantities are destroyed, so the routine
may be called sequentially with different B's. See the discussion in Numerical
Recipes in C, section 2.6.
This routine assumes that near zero singular values have already been zeroed.
It returns no values, storing the result in `X.' It does use some auxiliary
storage, which can be passed in as `tmp,' a double-float array of length `n,' if
you want to avoid consing.

# SVBKSB-SF (U W V M N B X &OPTIONAL (TMP (MAKE-ARRAY N ELEMENT-TYPE 'SINGLE-FLOAT)))

Solves A X = B for a vector `X,' where A is specified by the mxn array U, `n'
vector W, and nxn matrix V as returned by svdcmp. `m' and `n' are the
dimensions of `A,' and will be equal for square matrices. `B' is the 1xm input
vector for the right-hand side. `X' is the 1xn output solution vector. All
arrays are of single-floats. No input quantities are destroyed, so the routine
may be called sequentially with different B's. See the discussion in Numerical
Recipes in C, section 2.6.
This routine assumes that near zero singular values have already been zeroed.
It returns no values, storing the result in `X.' It does use some auxiliary
storage, which can be passed in as `tmp,' a single-float array of length `n,' if
you want to avoid consing.

# SVD-BACK-SUBSTITUTE (U W V B)

Returns the solution vector to the Ax=b, where A has been decomposed into
`u,' `w' and `v' by `singular-value-decomposition.' This function is just a
minor wrapping of `svbksb-sf' and `svbksb-df.'

# SVD-INVERSE-FAST-DF (U W V &OPTIONAL (A-1 (MAKE-ARRAY (LIST (LENGTH W) (LENGTH W)) ELEMENT-TYPE 'DOUBLE-FLOAT)) (TMP (MAKE-ARRAY (LENGTH W) ELEMENT-TYPE 'DOUBLE-FLOAT)))

Computes the inverse of a matrix that has been decomposed into `u,' `w' and
`v' by singular value decomposition. It assumes the ``small'' elements of `w'
have already been zeroed. It computes the inverse by taking advantage of the
known zeros in the full 2-dimensional `w' matrix. It uses the backsubstitution
algorithm, only with the B vectors fixed at the columns of the identity matrix,
which lets us take advantage of its zeros. It's about twice as fast as the slow
version and conses a lot less. Note that if you are computing the inverse
merely to solve one or more systems of equations, you are better off using the
decomposition and backsubstitution routines directly.

# SVD-INVERSE-FAST-SF (U W V &OPTIONAL (A-1 (MAKE-ARRAY (LIST (LENGTH W) (LENGTH W)) ELEMENT-TYPE 'SINGLE-FLOAT)) (TMP (MAKE-ARRAY (LENGTH W) ELEMENT-TYPE 'SINGLE-FLOAT)))

Computes the inverse of a matrix that has been decomposed into `u,' `w' and
`v' by singular value decomposition. It assumes the ``small'' elements of `w'
have already been zeroed. It computes the inverse by taking advantage of the
known zeros in the full 2-dimensional `w' matrix. It uses the backsubstitution
algorithm, only with the B vectors fixed at the columns of the identity matrix,
which lets us take advantage of its zeros. It's about twice as fast as the slow
version and conses a lot less. Note that if you are computing the inverse
merely to solve one or more systems of equations, you are better off using the
decomposition and backsubstitution routines directly.

# SVD-INVERSE-SLOW-DF (U W V &OPTIONAL (A-1 (MAKE-ARRAY (LIST (LENGTH W) (LENGTH W)) ELEMENT-TYPE 'DOUBLE-FLOAT)))

Computes the inverse of a matrix that has been decomposed into `u,' `w' and
`v' by singular value decomposition. It assumes the ``small'' elements of `w'
have already been zeroed. It computes the inverse by constructing a diagonal
matrix `w2' from `w' (which is just a vector of the diagonal elements, and then
explicitly multiplying u^t w2 and v. Note that if you are computing the inverse
merely to solve one or more systems of equations, you are better off using the
decomposition and backsubstitution routines directly.

# SVD-INVERSE-SLOW-SF (U W V &OPTIONAL (A-1 (MAKE-ARRAY (LIST (LENGTH W) (LENGTH W)) ELEMENT-TYPE 'SINGLE-FLOAT)))

Computes the inverse of a matrix that has been decomposed into `u,' `w' and
`v' by singular value decomposition. It assumes the ``small'' elements of `w'
have already been zeroed. It computes the inverse by constructing a diagonal
matrix `w2' from `w' (which is just a vector of the diagonal elements, and then
explicitly multiplying u^t w2 and v. Note that if you are computing the inverse
merely to solve one or more systems of equations, you are better off using the
decomposition and backsubstitution routines directly.

# SVD-MATRIX-INVERSE (A &OPTIONAL (SINGULARITY-THRESHOLD 1.d-10))

Use singular value decomposition to compute the inverse of `A.' If an exact
inverse is not possible, then zero the otherwise infinite inverted singular
value and compute the inverse. The inverse is returned; `A' is not destroyed.
If you're using this to solve several systems of equations, you're better off
computing the singular value decomposition and using it several times, because
this function computes it anew each time.

# SVD-SOLVE-LINEAR-SYSTEM (MATRIX B-VECTOR &OPTIONAL (REPORT? T) (THRESHOLD 1.e-6))

Returns solution of linear system matrix * solution = b-vector. Employs the
singular value decomposition method. See the discussion in Numerical Recipes in
C, section 2.6, especially as to the semantics of `threshold.'

# SVD-ZERO (W &OPTIONAL (THRESHOLD 1.e-6) (REPORT? T))

If the relative magnitude of an element in `w' compared to the largest
element is less than `threshold,' then zero that element. Returns a list of
indices of the zeroed elements. This function is just a convenient wrapper for
`svzero-sf' and `svzero-df.'

# SVDCMP-DF (A M N W V &OPTIONAL (RV1 (MAKE-ARRAY N ELEMENT-TYPE 'DOUBLE-FLOAT)))

Given an `m'x`n' matrix `A,' this routine computes its singular value
decomposition, A = U W V^T. The matrix U replaces `A' on output. The diagonal
matrix of singular values W is output as a vector `W' of length `n.' The matrix
`V' -- not the transpose V^T -- is output as an `n'x`n' matrix `V.' The row
dimension `m' must be greater or equal to `n'; if it is smaller, then `A' should
be filled up to square with zero rows. See the discussion in Numerical Recipes
in C, section 2.6.
This routine returns no values, storing the results in `A,' `W,' and `V.' It
does use some auxiliary storage, which can be passed in as `rv1,' a double-float
array of length `n,' if you want to avoid consing.

# SVDCMP-SF (A M N W V &OPTIONAL (RV1 (MAKE-ARRAY N ELEMENT-TYPE 'SINGLE-FLOAT)))

Given an `m'x`n' matrix `A,' this routine computes its singular value
decomposition, A = U W V^T. The matrix U replaces `A' on output. The diagonal
matrix of singular values W is output as a vector `W' of length `n.' The matrix
`V' -- not the transpose V^T -- is output as an `n'x`n' matrix `V.' The row
dimension `m' must be greater or equal to `n'; if it is smaller, then `A' should
be filled up to square with zero rows. See the discussion in Numerical Recipes
in C, section 2.6.
This routine returns no values, storing the results in `A,' `W,' and `V.' It
does use some auxiliary storage, which can be passed in as `rv1,' a single-float
array of length `n,' if you want to avoid consing. All input arrays should be
of single-floats.

# SVDVAR (V W &OPTIONAL CVM)

Given `v' and `w' as computed by singular value decomposition, computes the
covariance matrix among the predictors. Based on Numerical Recipes in C,
section 15.4, algorithm `svdvar.' The covariance matrix is returned. It can be
supplied as the third argument.

# SVZERO-DF (W N THRESHOLD &OPTIONAL (REPORT? T))

If the relative magnitude of an element in `w' compared to the largest
element is less than `threshold,' then zero that element. If `report?' is true,
the indices of zeroed elements are printed. Returns a list of the indices of
zeroed elements. This routine uses double-floats.

# SVZERO-SF (W N THRESHOLD &OPTIONAL (REPORT? T))

If the relative magnitude of an element in `w' compared to the largest
element is less than `threshold,' then zero that element. If `report?' is true,
the indices of zeroed elements are printed. Returns a list of indices of the
zeroed elements. This routine uses single-floats.

# T-TEST-INTERNAL (SAMPLE-1 SAMPLE-2 &OPTIONAL (TAILS BOTH) (H0MEAN 0))

See T-TEST

# T-TEST-MATCHED-INTERNAL (SAMPLE1 SAMPLE2 &OPTIONAL (TAILS BOTH))

See T-TEST-MATCHED

# T-TEST-ONE-SAMPLE-INTERNAL (DATA TAILS &OPTIONAL (H0-MEAN 0) &REST STANDARD-ARGS &KEY START END KEY)

See T-TEST-ONE-SAMPLE

# TRIMMED-MEAN-INTERNAL (DATA PERCENTAGE &REST STANDARD-ARGS &KEY START END KEY)

See TRIMMED-MEAN

# TUKEY-SUMMARY-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See TUKEY-SUMMARY

# VARIANCE-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See VARIANCE

# Z-TEST-ONE-SAMPLE-INTERNAL (DATA TAILS &OPTIONAL (H0-MEAN 0) (H0-STD-DEV 1) &REST STANDARD-ARGS &KEY START END KEY)

See Z-TEST-ONE-SAMPLE

# Undocumented

# 1-OR-2D-ARRAYP (ARRAY)

# CONFIDENCE-INTERVAL-INTERNAL

# DATA-CONTINUOUS-P (SEQUENCE)

# ERROR-FUNCTION-COMPLEMENT-SHORT-1 (Y Z)

# ERROR-FUNCTION-COMPLEMENT-SHORT-2 (Y)

# FILL-2D-ARRAY (ARRAY LIST)

# LIST-2D-ARRAY (ARRAY)

# SIGNIFICANCE-INTERNAL

# SMART-MODE (SEQUENCE &REST ARGS)

# T-SIGNIFICANCE-INTERNAL

# MACRO

# Public

# UNDERFLOW-GOES-TO-ZERO (&BODY BODY)

Protects against floating point underflow errors and sets the value to 0.0 instead.

# WITH-TEMP-TABLE ((TEMP) &BODY FORMS)

Binds `temp' to a hash table.

# WITH-TEMP-VECTOR ((TEMP MIN-SIZE) &BODY FORMS)

Binds `temp' to a vector of length at least `min-size.' It's a vector of
pointers and has a fill-pointer, initialized to `min-size.'

# Private

# CHECK-TYPE-OF-ARG (ARG-NAME PREDICATE TYPE-STRING &OPTIONAL ERROR-TYPE-NAME)

Generate error if the value of ARG-NAME doesn't satisfy PREDICATE.
PREDICATE is a function name (a symbol) or an expression to compute.
TYPE-STRING is a string to use in the error message, such as "a list".
ERROR-TYPE-NAME is a keyword that tells condition handlers what type was desired.

# DEFINE-STATISTIC (NAME &OPTIONAL SUPERCLASSES SLOTS VALUES ARGUMENT-TYPES LAMBDA-LIST &BODY BODY)

In clasp, statistical objects have two parts, a class which stores the
various parts of the object and a computing function which computes the value
of the object from arguments. The define-statistic macro allows the
definition of new statistical types. The define-statistic macro must be
provided with all the information necessary to create a statistical object,
that is, everything required to create a new class, everything required to
create a computing function and some information to connect the two. This
last part consists of a list of arguments and their types and a list which
determines how the values of a statistical function should be used to fill the
slots of a statistical object.
When define-statistic is invoked, two things happen, first a class is defined
which is a subclass of 'statistic and any other named `superclasses'. Second,
a pair of functions is defined. `clasp-statistics::name' is an internal
function which has the supplied `body' and `lambda-list' and must return as
many values as there are slots in the class `name'. The function `name' is
also defined, it is basically a wrapper function which converts its arguments
to those which are accepted by `body' and then calls `clasp-statistics::name'.
The parameter clasp:*create-statistical-objects* determines whether the
wrapper function packages the values returned by the intern function into a
statistical object or just returns them as multiple values.
The `argument-types' argument must be an alist in which the keys are the
names of arguments as given in `lambda-list' and the values are lisp types
which those arguments will be converted to before calling the internal
statistical function. The primary purpose of this is to allow for coersion of
clasp variables to sequences, but any coercion which is allowed by lisp is
acceptable. The `values' argument is intended to allow the programmer to
specify which slots in the statistical object are filled by which of the
values returned by the statistical function. By default, the order of the
values is assumed to be direct slots in order of specification, inherited
slots in order of specification in the superclasses which are also statistics.

# Undocumented

# AREF1 (A I)

# AREF11 (A I J)

# SIGN-DF (A B)

# SIGN-SF (A B)

# START/END (CALL-FORM START-N END-N)

# WITH-ROUTINE-ERROR-HANDLING (&BODY BODY)

# GENERIC-FUNCTION

# Public

# DOT-PRODUCT (SEQUENCE-1 SEQUENCE-2)

http://en.wikipedia.org/wiki/Dot_product

# Undocumented

# CONVERT (OBJECT TYPE)

# CROSS-PRODUCT (NUMBER-LIST-1 NUMBER-LIST-2)

# Private

# Undocumented

# COMPOSITE-STATISTIC-P (IT)

# MAKE-STATISTIC (TYPE &REST ARGS)

# SIMPLE-STATISTIC-P (IT)

# STATISTICP (IT)

# VARIABLE

# Private

# *TEMPORARY-TABLE*

A temporary table. This avoids consing.

# *TEMPORARY-VECTOR*

A temporary vector for use by statistical functions such as `quantile,' which
uses it for sorting data. This avoids consing or rearranging the user's data.

# Undocumented

# *CONTINOUS-DATA-WINDOW-DIVISOR*

# *CONTINUOUS-VARIABLE-UNIQUENESS-FACTOR*

# *CREATE-STATISTICAL-OBJECTS*

# *GAUSSIAN-CDF-SIGNALS-ZERO-STANDARD-DEVIATION-ERROR*

# *WAY-TOO-BIG-CONTINGENCY-TABLE-DIMENSION*

# CLASS

# Public

# Undocumented

# ANOVA-ONE-WAY-VARIABLES (&REST ARGS)

# ANOVA-TWO-WAY-VARIABLES (&REST ARGS)

# ANOVA-TWO-WAY-VARIABLES-UNEQUAL-CELL-SIZES (&REST ARGS)

# AUTOCORRELATION (&REST ARGS)

# CONFIDENCE-INTERVAL (&REST ARGS)

# CONFIDENCE-INTERVAL-PROPORTION (&REST ARGS)

# CONFIDENCE-INTERVAL-T (&REST ARGS)

# CONFIDENCE-INTERVAL-Z (&REST ARGS)

# CORRELATION (&REST ARGS)

# COVARIANCE (&REST ARGS)

# CROSS-CORRELATION (&REST ARGS)

# D-TEST (&REST ARGS)

# DATA-LENGTH (&REST ARGS)

# INTERQUARTILE-RANGE (&REST ARGS)

# MAXIMUM (&REST ARGS)

# MEAN (&REST ARGS)

# MEDIAN (&REST ARGS)

# MINIMUM (&REST ARGS)

# MODE (&REST ARGS)

# MULTIPLE-MODES (&REST ARGS)

# QUANTILE (&REST ARGS)

# RANGE (&REST ARGS)

# SIGNIFICANCE (&REST ARGS)

# SKEWNESS (&REST ARGS)

# STANDARD-DEVIATION (&REST ARGS)

# STATISTICAL-SUMMARY (&REST ARGS)

# T-SIGNIFICANCE (&REST ARGS)

# T-TEST (&REST ARGS)

# T-TEST-MATCHED (&REST ARGS)

# T-TEST-ONE-SAMPLE (&REST ARGS)

# TRIMMED-MEAN (&REST ARGS)

# TUKEY-SUMMARY (&REST ARGS)

# VARIANCE (&REST ARGS)

# Z-TEST-ONE-SAMPLE (&REST ARGS)

# Private

# Undocumented

# COMPOSITE-STATISTIC

# DATA

# SIMPLE-STATISTIC

# STATISTIC

# CONDITION

# Private

# Undocumented

# DATA-ERROR

# ENORMOUS-CONTINGENCY-TABLE

# INSUFFICIENT-DATA

# NO-DATA

# NOT-BINARY-VARIABLES

# UNMATCHED-SEQUENCES

# ZERO-STANDARD-DEVIATION

# ZERO-VARIANCE

# CONSTANT

# Public

# +E+

An approximation of the constant e (named for Euler!).

# 2FPI

The constant 2*pi, in single-float format. Using this constant avoid
run-time double-float contagion.

# FPI

The constant pi, in single-float format. Using this constant avoid
run-time double-float contagion.