Common Lisp Package: CL-MATHSTATS

README:

FUNCTION

Public

ANOVA-ONE-WAY-VARIABLES (&REST ARGS)

ANOVA-ONE-WAY-VARIABLES (IV DV &OPTIONAL (SCHEFFE-TESTS-P T) CONFIDENCE-INTERVALS) Performs a one-way analysis of variance (ANOVA) on the input data, which should be two equal-length sequences: `iv' is the independent variable, represented as a sequence of categories or group identifiers, and `dv' is the dependent variable, represented as a sequence of numbers. The `iv' variable must be ``sorted,'' meaning that AAABBCCCCCDDDD is okay but ABCDABCDABDCDC is not, where A, B, C and D are group identifiers. Furthermore, each group should consist of at least 2 elements. The significance of the result indicates that the group means are not all equal; that is, at least two of the groups have significantly different means. If there were only two groups, this would be semantically equivalent to an unmatched, two-tailed t-test, so you can think of the one-way ANOVA as a multi-group, two-tailed t-test. This function returns five values: 1. an ANOVA table; 2. a list a group means; 3. either a Scheffe table or nil depending on `scheffe-tests-p'; and 4. an alternate value for SST. 5. a list of confidence intervals in the form `(,mean ,lower ,upper) for each group, if `confidence-intervals' is a number between zero and one, giving the kind of confidence interval, such as 0.9. The fourth value is only interesting if you think there are numerical accuracy problems; it should be approximately equal to the SST value in the ANOVA table. This function differs from `anova-one-way-groups' only in its input representation. See the manual for more information.

ANOVA-TWO-WAY-VARIABLES (&REST ARGS)

ANOVA-TWO-WAY-VARIABLES (DV IV1 IV2) Calculates the analysis of variance when there are two factors that may affect the dependent variable, specifically `iv1' and `iv2.' Unlike the one-way ANOVA, there are mathematical difficulties with the two-way ANOVA if there are unequal cell sizes; therefore, we require all cells to be the same size; that is, the same number of values (of the dependent variable) for each combination of the independent factors. The result of the analysis is an anova-table, as described in the manual. This function differs from `anova-two-way-groups' only in its input representation. See the manual for further discussion of analysis of variance. The row effect is `iv1' and the column effect is `iv2.'

ANOVA-TWO-WAY-VARIABLES-UNEQUAL-CELL-SIZES (&REST ARGS)

ANOVA-TWO-WAY-VARIABLES-UNEQUAL-CELL-SIZES (IV1 IV2 DV) Calculates the analysis of variance when there are two factors that may affect the dependent variable, specifically `iv1' and `iv2.' Unlike the one-way ANOVA, there are mathematical difficulties with the two-way ANOVA if there are unequal cell sizes. This function differs fron the standard two-anova by (1) the use of cell means as single scores, (2) the division of squared quantities by the number of cell means contributing to the quantity that is squared and (3) the multiplication of a "sum of squares" by the harmonic mean of the sample sizes. The result of the analysis is an anova-table, as described in the manual. See the manual for further discussion of analysis of variance. The row effect is `iv1' and the column effect is `iv2.'

AUTOCORRELATION (&REST ARGS)

AUTOCORRELATION (SAMPLE MAX-LAG &OPTIONAL (MIN-LAG 0)) Autocorrelation is merely a cross-correlation between a sample and itself. This function returns a list of correlations, where the i'th element is the correlation of the sample with the sample starting at `i.'

BETA (Z W)

Returns the value of the Beta function, defined in terms of the complete gamma function, G, as: G(z)G(w)/G(z+w). The implementation follows Numerical Recipes in C, section 6.1.

BETA-INCOMPLETE (A B X)

This function is useful in defining the cumulative distributions for Student's t and the F distribution. All arguments must be floating-point numbers; `a' and `b' must be positive and `x' must be between 0.0 and 1.0, inclusive.

BINOMIAL-CDF (P N K)

Suppose an event occurs with probability `p' per trial. This function computes the probability of `k' or more events occurring in `n' trials. Note that this is the complement of the usual definition of cdf. This function approximates the actual computation using the incomplete beta function, but is preferable for large `n' (greater than a dozen or so) because it avoids summing many tiny floating-point numbers. The implementation follows Numerical Recipes in C, section 6.3.

BINOMIAL-CDF-EXACT (P N K)

This is an exact but computationally intensive form of the preferred function, `binomial-cdf.'

BINOMIAL-COEFFICIENT (N K)

Returns the binomial coefficient, `n' choose `k,' as an integer. The result may not be exactly correct, since the computation is done with logarithms. The result is rounded to an integer. The implementation follows Numerical Recipes in C, section 6.1

BINOMIAL-COEFFICIENT-EXACT (N K)

This is an exact but computationally intensive form of the preferred function, `binomial-coefficient.'

BINOMIAL-PROBABILITY (P N K)

Returns the probability of `k' successes in `n' trials, where at each trial the probability of success is `p.' This function uses floating-point approximations, and so is computationally efficient but not necessarily exact.

BINOMIAL-PROBABILITY-EXACT (P N K)

This is an exact but computationally intensive form of the preferred function, `binomial-probability.'

CHI-SQUARE-SIGNIFICANCE (X DOF)

Computes the complement of the cumulative distribution function for a Chi-square random variable with `dof' degrees of freedom evaluated at `x.' The result is the probability that the observed chi-square for a correct model should be greater than `x.' The implementation follows Numerical Recipes in C, section 6.2. Small values suggest that the null hypothesis should be rejected; in other words, this computes the significance of `x.'

COMBINATION-COUNT (N K)

Returns the number of combinations of n elements taken k at a time. Assumes valid input.

CONFIDENCE-INTERVAL (&REST ARGS)

CONFIDENCE-INTERVAL NIL NIL

CONFIDENCE-INTERVAL-PROPORTION (&REST ARGS)

CONFIDENCE-INTERVAL-PROPORTION (X N CONFIDENCE) Suppose we have a sample of `n' things and `x' of them are ``successes.'' We can estimate the population proportion of successes as x/n; call it `p-hat.' This function computes the estimate and a confidence interval on it. This function is not appropriate for small samples with p-hat far from 1/2: `x' should be at least 5, and so should `n'-`x.' This function returns three values: p-hat, and the lower and upper bounds of the confidence interval. `Confidence' should be a number between 0 and 1, exclusive.

CONFIDENCE-INTERVAL-T (&REST ARGS)

CONFIDENCE-INTERVAL-T (DATA CONFIDENCE) Suppose you have a sample of 10 numbers and you want to compute a 90 percent confidence interval on the population mean. This function is the one to use. This function uses the t-distribution, and so it is appropriate for small sample sizes. It can also be used for large sample sizes, but the function `confidence-interval-z' may be computationally faster. It returns three values: the mean and the lower and upper bound of the confidence interval. True, only two numbers are necessary, but the confidence intervals of other statistics may be asymmetrical and these values would be consistent with those confidence intervals. `Sample' should be a sequence of numbers. `Confidence' should be a number between 0 and 1, exclusive.

CONFIDENCE-INTERVAL-T-SUMMARIES (MEAN DOF STANDARD-ERROR CONFIDENCE)

This function is just like `confidence-interval-t,' except that instead of its arguments being the actual data, it takes the following summary statistics: `mean,' which is the estimator of some t-distributed parameter; `dof,' which is the number of degrees of freedom in estimating the mean; and the `standard-error' of the estimator. In general, `mean' is a point estimator of the mean of a t-distribution, which may be the slope parameter of a regression, the difference between two means, or other practical t-distributions. `Confidence' should be a number between 0 and 1, exclusive.

CONFIDENCE-INTERVAL-Z (&REST ARGS)

CONFIDENCE-INTERVAL-Z (DATA CONFIDENCE) Suppose you have a sample of 50 numbers and you want to compute a 90 percent confidence interval on the population mean. This function is the one to use. Note that it makes the assumption that the sampling distribution is normal, so it's inappropriate for small sample sizes. Use confidence-interval-t instead. It returns three values: the mean and the lower and upper bound of the confidence interval. True, only two numbers are necessary, but the confidence intervals of other statistics may be asymmetrical and these values would be consistent with those confidence intervals. This function handles 90, 95 and 99 percent confidence intervals as special cases, so those will be quite fast. `Sample' should be a sequence of numbers. `Confidence' should be a number between 0 and 1, exclusive.

CORRELATION (&REST ARGS)

CORRELATION (SAMPLE1 SAMPLE2 &KEY START1 END1 START2 END2) Computes the correlation coefficient of two samples, which should be equal-length sequences of numbers.

CORRELATION-FROM-SUMMARIES (N X X2 Y Y2 XY)

Computes the correlation of two variables given summary statistics of the variables. All of these arguments are summed over the variable: `x' is the sum of the x's, `x2' is the sum of the squares of the x's, and `xy' is the sum of the cross-products, which is also known as the inner product of the variables x and y. Of course, `n' is the number of data values in each variable.

CORRELATION-MATRIX (DV IVS)

Returns a matrix of all the correlations of all the variables. The dependent variable is row and column zero.

COVARIANCE (&REST ARGS)

COVARIANCE (SAMPLE1 SAMPLE2 &KEY START1 END1 START2 END2) Computes the covariance of two samples, which should be equal-length sequences of numbers. Covariance is the inner product of differences between sample elements and their sample means. For more information, see the manual.

CROSS-CORRELATION (&REST ARGS)

CROSS-CORRELATION (SEQUENCE1 SEQUENCE2 MAX-LAG &OPTIONAL (MIN-LAG 0)) Returns a list of the correlation coefficients for all lags from `min-lag' to `max-lag,' inclusive, where the `i'th list element is the correlation of the first (length-of-sequence1 - i) elements of sequence1 with with the last i elements of sequence2. Both sequences should be sequences of numbers and of equal length.

D-TEST (&REST ARGS)

D-TEST (SAMPLE-1 SAMPLE-2 TAILS &KEY (TIMES 1000) (H0MEAN 0)) Two-sample test for difference in means. Competes with the unmatched, two-sample t-test. Each sample should be a sequence of numbers. We calculate the mean of `sample-1' minus the mean of `sample-2'; call that D. Under the null hypothesis, D is zero. There are three possible alternative hypotheses: D is positive, D is negative, and D is either, and they are selected by the `tails' parameter, which must be :positive, :negative, or :both, respectively. We count the number of chance occurrences of D in the desired rejection region, and return the estimated probability.

DATA-LENGTH (&REST ARGS)

DATA-LENGTH (DATA &KEY START END KEY) Returns the number of data values in `data.' Essentially, this is the Common Lisp `length' function, except it handles sequences where there is a `start' or `end' parameter. The `key' parameter is ignored.

DEGREES->RADIANS (DEGREES)

Convert degrees to radians.

DIV2 (I &OPTIONAL (POWER 1))

Divide positive fixnum `i' by 2 or a power of 2, yielding an integer result. For example, (div2 35 5) => 1.

ERROR-FUNCTION (X)

Computes the error function, which is typically used to compute areas under the Gaussian probability distribution. See the manual for more information. Also see the function `gaussian-cdf.' This implementation follows Numerical Recipes in C, section 6.2

ERROR-FUNCTION-COMPLEMENT (X)

This function computes the complement of the error function, ``erfc(x),'' defined as 1-erf(x). See the documentation for `error-function' for a more complete definition and description. Essentially, this function on z/sqrt2 returns the two-tailed significance of z in a standard Gaussian distribution. This function implements the function that Numerical Recipes in C calls erfcc, see section 6.3; that is, it's the one using the Chebyshev approximation, since that is the one they call from their statistical functions. It is quick to compute and has fractional error everywhere less than 1.2x10^\{-7\}.

EXP2 (N)

2^n

EXTRACT-UNIQUE-VALUES (SEQUENCE)

A faster version of `remove-duplicates'. Note you cannot specify a :TEST (it is always #'eq).

F-MEASURE (PRECISION RECALL &OPTIONAL (BETA 0.5))

Returns the f-measure, the combination of precision and recall based on parameter beta - default = .5 which => precision and recall are equally weighted. beta = 1 => precision is maximized. beta = 0 => recall is maximized. From a recent statistics book - All of Statistics - springer verlag http://www2.springeronline.com/sgw/cda/frontpage/0,,4-10128-22-13887455-0,00.html

F-SIGNIFICANCE (F-STATISTIC NUMERATOR-DOF DENOMINATOR-DOF &OPTIONAL ONE-TAILED-P)

This function occurs in the statistical test of whether two observed samples have the same variance. A certain statistic, F, essentially the ratio of the observed dispersion of the first sample to that of the second one, is calculated. This function computes the tail areas of the null hypothesis: that the variances of the numerator and denominator are equal. It can be used for either a one-tailed or two-tailed test. The default is two-tailed, but one-tailed can be computed by setting the optional argument `one-tailed-p' to true. For a two-tailed test, this function computes the probability that F would be as different from 1.0 (larger or smaller) as it is, if the null hypothesis is true. For a one-tailed test, this function computes the probability that F would be as LARGE as it is if the first sample's underlying distribution actually has SMALLER variance that the second's, where `numerator-dof' and `denominator-dof' is the number of degrees of freedom in the numerator sample and the denominator sample. In other words, this computes the significance level at which the hypothesis ``the numerator sample has smaller variance than the denominator sample'' can be rejected. A small numerical value implies a very significant rejection. The `f-statistic' must be a non-negative floating-point number. The degrees of freedom arguments must be positive integers. The `one-tailed-p' argument is treated as a boolean. This implementation follows Numerical Recipes in C, section 6.3 and the `ftest' function in section 13.4. Some of the documentation is also drawn from the section 6.3, since I couldn't improve on their explanation.

FACTORIAL (N)

Returns the factorial of `n,' which should be a non-negative integer. The result will returned as a floating-point number, single-float if possible, otherwise double-float. If it is returned as a double-float, it won't necessarily be integral, since the actual computation is (exp (gamma-ln (1+ n))) Implementation is loosely based on Numerical Recipes in C, section 6.1. On the TI Explorer, the largest argument that won't cause a floating overflow is 170.

FACTORIAL-EXACT (N)

Returns the factorial of `n,' which should be an integer. The result will returned as an integer or bignum. This implementation is exact, but is more computationally expensive than `factorial,' which is to be preferred.

FACTORIAL-LN (N)

Returns the natural logarithm of n!; `n' should be an integer. The result will be a single-precision, floating point number. The implementation follows Numerical Recipes in C, section 6.1

GAMMA-INCOMPLETE (A X)

This is an incomplete gamma function, what Numerical Recipes in C calls ``gammp.'' This function also returns, as the second value, g(a,x). See the manual for more information.

GAMMA-LN (X)

Returns the natural logarithm of the Gamma function evaluated at `x.' Mathematically, the Gamma function is defined to be the integral from 0 to Infinity of t^x exp(-t) dt. The implementation is copied, with extensions for the reflection formula, from Numerical Recipes in C, section 6.1. The argument `x' must be positive. Full accuracy is obtained for x>1. For x<1, the reflection formula is used. The computation is done using double-floats, and the result is a double-float.

GAUSSIAN-CDF (X &OPTIONAL (MEAN 0.0) (SD 1.0))

Computes the cumulative distribution function for a Gaussian random variable (defaults: mean=0.0, s.d.=1.0) evaluated at `x.' The result is the probability of getting a random number less than or equal to `x,' from the given Gaussian distribution.

GAUSSIAN-SIGNIFICANCE (X TAILS &OPTIONAL MEAN SD)

Computes the significance of `x' in a Gaussian distribution with mean=`mean' (default 0.0) and standard deviation=`sd' (default 1.0); that is, it returns the area which farther from the mean than `x' is. The null hypothesis is roughly that `x' is zero; you must specify your alternative hypothesis (H1) via the `tails' parameter, which must be :both, :positive or :negative. The first corresponds to a two-tailed test: H1 is that `x' is not zero, but you are not specifying a direction. If the parameter is :positive, H1 is that `x' is positive, and similarly for :negative.

INTERQUARTILE-RANGE (&REST ARGS)

INTERQUARTILE-RANGE (DATA) The interquartile range is similar to the variance of a sample because both are statistics that measure out ``spread out'' a sample is. The interquartile range is the difference between the 3/4 quantile (the upper quartile) and the 1/4 quantile (the lower quartile).

LAGGED-CORRELATION (SEQUENCE1 SEQUENCE2 LAG)

Returns the correlations of `sequence1' with `sequence2' after shifting `sequence1' by `lag'. This means that for all n, element n of `sequence1' is paired with element n+`lag' of `sequence2', where both of those elements exist.

LINEAR-REGRESSION-BRIEF (DV IV)

Calculates the main statistics of a linear regression: the slope and intercept of the line, the coefficient of determination, also known as r-square, the standard error of the slope, and the p-value for the regression. This function takes two equal-length sequences of raw data. Note that the dependent variable, as always, comes first in the argument list. You should first look at your data with a scatter plot to see if a linear model is plausible. See the manual for a fuller explanation of linear regression statistics.

LINEAR-REGRESSION-BRIEF-SUMMARIES (N X Y X2 Y2 XY)

Calculates the main statistics of a linear regression: the slope and intercept of the line, the coefficient of determination, also known as r-square, the standard error of the slope, and the p-value for the regression. This function differs from `linear-regression-brief' in that it takes summary variables: `x' and `y' are the sums of the independent variable and dependent variables, respectively; `x2' and `y2' are the sums of the squares of the independent variable and dependent variables, respectively; and `xy' is the sum of the products of the independent and dependent variables. You should first look at your data with a scatter plot to see if a linear model is plausible. See the manual for a fuller explanation of linear regression statistics.

LINEAR-REGRESSION-MINIMAL (DV IV)

Calculates the slope and intercept of the regression line. This function takes two equal-length sequences of raw data. Note that the dependent variable, as always, comes first in the argument list. You should first look at your data with a scatter plot to see if a linear model is plausible. See the manual for a fuller explanation of linear regression statistics.

LINEAR-REGRESSION-MINIMAL-SUMMARIES (N X Y X2 Y2 XY)

Calculates the slope and intercept of the regression line. This function differs from `linear-regression-minimal' in that it takes summary statistics: `x' and `y' are the sums of the independent variable and dependent variables, respectively; `x2' and `y2' are the sums of the squares of the independent variable and dependent variables, respectively; and `xy' is the sum of the products of the independent and dependent variables. You should first look at your data with a scatter plot to see if a linear model is plausible. See the manual for a fuller explanation of linear regression statistics.

LINEAR-REGRESSION-VERBOSE (DV IV)

Calculates almost every statistic of a linear regression: the slope and intercept of the line, the standard error on each, the correlation coefficient, the coefficient of determination, also known as r-square, and an ANOVA table as described in the manual. This function takes two equal-length sequences of raw data. Note that the dependent variable, as always, comes first in the argument list. If you don't need all this information, consider using the ``-brief,'' or ``-minimal'' functions, which do less computation. You should first look at your data with a scatter plot to see if a linear model is plausible. See the manual for a fuller explanation of linear regression statistics.

LINEAR-REGRESSION-VERBOSE-SUMMARIES (N X Y X2 Y2 XY)

Calculates almost every statistic of a linear regression: the slope and intercept of the line, the standard error on each, the correlation coefficient, the coefficient of determination, also known as r-square, and an ANOVA table as described in the manual. If you don't need all this information, consider using the ``-brief'' or ``-minimal'' functions, which do less computation. This function differs from `linear-regression-verbose' in that it takes summary variables: `x' and `y' are the sums of the independent variable and dependent variables, respectively; `x2' and `y2' are the sums of the squares of the independent variable and dependent variables, respectively; and `xy' is the sum of the products of the independent and dependent variables. You should first look at your data with a scatter plot to see if a linear model is plausible. See the manual for a fuller explanation of linear regression statistics.

LINEAR-SCALE (VALUE OLD-MIN OLD-MAX NEW-MIN NEW-MAX)

Rescales value linearly from the old-min/old-max scale to the new-min/new-max one.

LOG2 (N)

Log of `n' to base 2.

MATRIX-MULTIPLY (&REST ARGS)

Does successive multiplications of each element in `args'. If two elements are scalar, then their product is i * j, if a scalar is multiplied by a matrix, then each element in the matrix is multiplied by the scalar, lastly, if two matrices are multiplied then standard matrix multiplication is applied, and the ranks must be such that if ARGi is rank a x b and ARGj is rank c x d, then b must be equal to c.

MAXIMUM (&REST ARGS)

MAXIMUM (DATA &KEY START END KEY) Returns the element of the sequence `data' whose `key' is maximum. Signals `no-data' if there is no data. If there is only one element in the data sequence, that element will be returned, regardless of whether it is valid (a number).

MEAN (&REST ARGS)

MEAN (DATA &KEY START END KEY) Returns the arithmetic mean of `data,' which should be a sequence. Signals `no-data' if there is no data.

MEDIAN (&REST ARGS)

MEDIAN (DATA &KEY START END KEY) Returns the median of the subsequence of `data' from `start' to `end', using `key'. The median is just the 0.5 quantile, and so this function returns the same values as the `quantile' function.

MINIMUM (&REST ARGS)

MINIMUM (DATA &KEY START END KEY) Returns the element of the sequence `data' whose `key' is minimum. Signals `no-data' if there is no data. If there is only one element in the data sequence, that element will be returned, regardless of whether it is valid (a number).

MOD2 (N POWER)

Find `n' mod a power of 2.

MODE (&REST ARGS)

MODE (DATA &KEY START END KEY) Returns the most frequent element of `data,' which should be a sequence. The algorithm involves sorting, and so the data must be numbers or the `key' function must produce numbers. Consider `sxhash' if no better function is available. Also returns the number of occurrences of the mode. If there is more than one mode, this returns the first mode, as determined by the sorting of the numbers.

MULTIPLE-LINEAR-REGRESSION-ARRAYS (DV &REST IVS)

This is an internal function for the use of the multiple-linear-regression functions. It takes the lists of values given by CLASP and puts them into a pair of arrays, A and b, suitable for solving the matrix equation Ax=b, to find the regression equation. The values are A and b. The first column of A is the constant 1, so that an intercept will be included in the regression model.

MULTIPLE-LINEAR-REGRESSION-BRIEF (DV &REST IVS)

Let m be the number of independent variables, `ivs.' This function returns a vector of length m which are the coefficients of a linear equation that best predicts the dependent variable, `dv,' in the least squares sense. It also returns, as the second value, the sum of squared deviations of the data from the fitted model, aka SSE, aka chi-square. The third value is the number of degrees of freedom for the chi-square, if you want to test the fit. This function returns an intermediate amount of information. Consider using the sibling functions -minimal and -verbose if you want less or more information.

MULTIPLE-LINEAR-REGRESSION-MINIMAL (DV &REST IVS)

Let m be the number of independent variables, `ivs.' This function returns a vector of length m which are the coefficients of a linear equation that best predicts the dependent variable, `dv,' in the least squares sense. This function returns the minimal information for a least squares regression model, namely a list of the coefficients of the ivs, with the constant term first. Consider using the sibling functions -brief and -verbose if you want more information.

MULTIPLE-LINEAR-REGRESSION-NORMAL (DV &REST IVS)

Performs linear regression of the dependent variable, `dv,' on multiple independent variables, `ivs.' Y on multiple X's, calculating the intercept and regression coefficient. Calculates the F statistic, intercept and the correlation coefficient for Y on X's.

MULTIPLE-LINEAR-REGRESSION-VERBOSE (DV &REST IVS)

Let m be the number of independent variables, `ivs.' This function returns fourteen values: 1. the intercept 2. a list of coefficients 3. a list of correlations of each iv to the dv and to each iv 4. a list of the t-statistic for each coefficient 5. a list of the standardized coefficients (betas) 6. the fraction of variance accounted for, aka r-square 7. the ratio of MSR (see #12) to MSE (see #13), aka F 8. a list of the portion of the SSR due to each iv 9. a list of the fraction of variance accounted for by each iv 10. the sum of squares of the regression, aka SSR 11. the sum of squares of the residuals, aka SSE, aka chi-square 12. the mean squared error of the regression, aka MSR 13. the mean squared error of the residuals, aka MSE 14. a list of indices of ``zeroed'' independent variables This function returns a lot of information about the regression. Consider using the sibling functions -minimal and -brief if you need less information.

MULTIPLE-MODES (&REST ARGS)

MULTIPLE-MODES (DATA K &KEY START END KEY) Returns the `k' most frequent elements of `data,' which should be a sequence. The algorithm involves sorting, and so the data must be numbers or the `key' function must produce numbers. Consider #'sxhash if no better function is available. Also returns the number of occurrences of each mode. The value is an association list of modes and their counts. This function is a little more computationally expensive than `mode,' so only use it if you really need multiple modes.

NORMALIZE-MATRIX (M)

Returns a new matrix such that the sum of its elements is 1.0

ON-INTERVAL (X LOWER-BOUND UPPER-BOUND &KEY (LOWER-INCLUSIVE? T) (UPPER-INCLUSIVE? T))

returns t iff x in the interval

PERMUTATION-COUNT (N K)

Returns the number of possible ways of taking k elements out of n total.

POISSON-CDF (K X)

Computes the cumulative distribution function for a Poisson random variable with mean `x' evaluated at `k.' The result is the probability that the number of Poisson random events occurring will be between 0 and k-1 inclusive, if the expected number is `x.' The argument `k' should be an integer, while `x' should be a float. The implementation follows Numerical Recipes in C, section 6.2

QUANTILE (&REST ARGS)

QUANTILE (DATA Q &KEY START END KEY) Returns the element which is the q'th percentile of the data when accessed by `key.' That is, it returns the element such that `q' of the data is smaller than it and 1-`q' is above it, where `q' is a number between zero and one, inclusive. For example, if `q' is .5, this returns the median; if `q' is 0, this returns the minimum (although the `minimum' function is more efficient). This function uses the bisection method, doing linear interpolation between elements i and i+1, where i=floor(q(n-1)). See the manual for more information. The function returns three values: the interpolated quantile and the two elements that determine the interval it was interpolated in. If the quantile was exact, the second two values are the same element of the data.

R-SCORE (NUMBER-LIST-1 NUMBER-LIST-2)

Takes two sequences and returns the correlation coefficient. Formula: Sum (Cross-product (Difference-list (number-list-1) Difference-list (number-list-2)) / (Sqrt (Sum-of-Squares (number-list-1) * Sum-of-Squares (number-list-2)))).

RADIANS->DEGREES (RADIANS)

Convert radians to degrees. Does not round the result.

RANGE (&REST ARGS)

RANGE (DATA &KEY START END KEY) Returns the range of the sequence `data.' Signals `no-data' if there is no data. The range is given by max - min.

ROUND-TO-FACTOR (N FACTOR)

Equivalent to (* factor (round n factor)). For example, `round-to-factor' of 65 and 60 is 60. Useful for converting to certain units, say when converting minutes to the nearest hours. See also `truncate-to-factor.'

SAFE-EXP (X)

Eliminates floating point underflow for the exponential function. Instead, it just returns 0.0d0

SCHEFFE-TESTS (GROUP-MEANS GROUP-SIZES MS-ERROR DF-ERROR)

Performs all pairwise comparisons between group means, testing for significance using Scheffe's F-test. Returns an upper-triangular table in a format described in the manual. Also see the function `print-scheffe-table.' `Group-means' and `group-sizes' should be sequences. The arguments `ms-error' and `df-error' are the mean square error within groups and its degrees of freedom, both of which are computed by the analysis of variance. An ANOVA test should always be run first, to see if there are any significant differences.

SIGNIFICANCE (&REST ARGS)

SIGNIFICANCE NIL NIL

SKEWNESS (&REST ARGS)

SKEWNESS (DATA &KEY START END KEY) Returns the skewness of `data', which is the sum of cubed distances from the mean divided by the standard deviation, divided by N.

SMOOTH-4253H (DATA)

Smooths `data' by successive smoothing: 4,median; then 2,median; then 5,median; then 3,median; then hanning. The ends are handled by duplicating the end elements. This function is not destructive; it returns a list the same length as `data,' which should be a list of numbers.

SMOOTH-HANNING (DATA)

Smooths `data' by replacing each element with the weighted mean of it and its two neighbors. The weights are 1/2 for itself and 1/4 for each neighbor. The ends are handled by duplicating the end elements. This function is not destructive; it returns a list the same length as `data,' which should be a sequence of numbers.

SMOOTH-MEAN-2 (DATA)

With a window of size two, the median and mean smooth functions are the same.

SMOOTH-MEAN-3 (DATA)

Smooths `data' by replacing each element with the mean of it and its two neighbors. The ends are handled by duplicating the end elements. This function is not destructive; it returns a list the same length as `data,' which should be a sequence of numbers.

SMOOTH-MEAN-4 (DATA)

Smooths `data' by replacing each element with the mean of it, its left neighbor, and its two right neighbors. The ends are handled by duplicating the end elements. This function is not destructive; it returns a list the same length as `data,' which should be a sequence of numbers.

SMOOTH-MEAN-5 (DATA)

Smooths `data' by replacing each element with the median of it, its two left neighbors and its two right neighbors. The ends are handled by duplicating the end elements. This function is not destructive; it returns a list the same length as `data,' which should be a sequence of numbers.

SMOOTH-MEDIAN-2 (DATA)

Smooths `data' by replacing each element with the median of it and its neighbor on the left. A median of two elements is the same as their mean. The end is handled by duplicating the end element. This function is not destructive; it returns a list the same length as `data,' which should be a sequence of numbers.

SMOOTH-MEDIAN-3 (DATA)

Smooths `data' by replacing each element with the median of it and its two neighbors. The ends are handled by duplicating the end elements. This function is not destructive; it returns a list the same length as `data,' which should be a sequence of numbers.

SMOOTH-MEDIAN-4 (DATA)

Smooths `data' by replacing each element with the median of it, its left neighbor, and its two right neighbors. The ends are handled by duplicating the end elements. This function is not destructive; it returns a list the same length as `data,' which should be a sequence of numbers.

SMOOTH-MEDIAN-5 (DATA)

Smooths `data' by replacing each element with the median of it, its two left neighbors and its two right neighbors. The ends are handled by duplicating the end elements. This function is not destructive; it returns a list the same length as `data,' which should be a sequence of numbers.

STANDARD-DEVIATION (&REST ARGS)

STANDARD-DEVIATION (DATA &KEY START END KEY) Returns the standard deviation of `data,' which is just the square root of the variance. Signals `no-data' if there is no data. Signals `insufficient-data' if there is only one datum.

STATISTICAL-SUMMARY (&REST ARGS)

STATISTICAL-SUMMARY (DATA &KEY START END KEY) Compute the length, minimum, maximum, range, median, mode, mean, variance, standard deviation, and interquartile-range of `sequence' from `start' to `end', accessed by `key'.

STUDENTS-T-SIGNIFICANCE (T-STATISTIC DOF TAILS)

Student's distribution is much like the Gaussian distribution except with heavier tails, depending on the number of degrees of freedom, `dof.' As `dof' goes to infinity, Student's distribution approaches the Gaussian. This function computes the significance of `t-statistic.' Values range from 0.0 to 1.0: small values suggest that the null hypothesis---that `t-statistic' is drawn from a t distribution---should be rejected. The `t-statistic' parameter should be a float, while `dof' should be an integer. The null hypothesis is roughly that `t-statistic' is zero; you must specify your alternative hypothesis (H1) via the `tails' parameter, which must be :both, :positive or :negative. The first corresponds to a two-tailed test: H1 is that `t-statistic' is not zero, but you are not specifying a direction. If the parameter is :positive, H1 is that `t-statistic' is positive, and similarly for :negative. This implementation follows Numerical Recipes in C, section 6.3.

T-SIGNIFICANCE (&REST ARGS)

T-SIGNIFICANCE NIL NIL

T-TEST (&REST ARGS)

T-TEST (SAMPLE-1 SAMPLE-2 &OPTIONAL (TAILS BOTH) (H0MEAN 0)) Returns the t-statistic for the difference in the means of two samples, which should each be a sequence of numbers. Let D=mean1-mean2. The null hypothesis is that D=0. The alternative hypothesis is specified by `tails': `:both' means D/=0, `:positive' means D>0, and `:negative' means D<0. Unless you're using :both tails, be careful what order the two samples are in: it matters! The function also returns the significance, the standard error, and the degrees of freedom. Signals `standard-error-is-zero' if that condition occurs. Signals `insufficient-data' unless there are at least two elements in each sample.

T-TEST-MATCHED (&REST ARGS)

T-TEST-MATCHED (SAMPLE1 SAMPLE2 &OPTIONAL (TAILS BOTH)) Returns the t-statistic for two matched samples, which should be equal-length sequences of numbers. Let D=mean1-mean2. The null hypothesis is that D=0. The alternative hypothesis is specified by `tails': `:both' means D/=0, `:positive' means D>0, and `:negative' means D<0. Unless you're using :both tails, be careful what order the two samples are in: it matters! The function also returns the significance, the standard error, and the degrees of freedom. Signals `standard-error-is-zero' if that condition occurs. Signals `insufficient-data' unless there are at least two elements in each sample.

T-TEST-ONE-SAMPLE (&REST ARGS)

T-TEST-ONE-SAMPLE (DATA TAILS &OPTIONAL (H0-MEAN 0) &KEY START END KEY) Returns the t-statistic for the mean of the data, which should be a sequence of numbers. Let D be the sample mean. The null hypothesis is that D equals the `H0-mean.' The alternative hypothesis is specified by `tails': `:both' means D /= H0-mean, `:positive' means D > H0-mean, and `:negative' means D < H0-mean. The function also returns the significance, the standard error, and the degrees of freedom. Signals `zero-variance' if that condition occurs. Signals `insufficient-data' unless there are at least two elements in the sample.

TIMES2 (I &OPTIONAL (POWER 1))

Multiply `i' by a power of 2.

TRIMMED-MEAN (&REST ARGS)

TRIMMED-MEAN (DATA PERCENTAGE &KEY START END KEY) Returns a trimmed mean of `data.' A trimmed mean is an ordinary, arithmetic mean of the data, except that an outlying percentage has been discarded. For example, suppose there are ten elements in `data,' and `percentage' is 0.1: the result would be the mean of the middle eight elements, having discarded the biggest and smallest elements. If `percentage' doesn't result in a whole number of elements being discarded, then a fraction of the remaining biggest and smallest is discarded. For example, suppose `data' is '(1 2 3 4 5) and `percentage' is 0.25: the result is (.75(2) + 3 + .75(4))/(.75+1+.75) or 3. By convention, the 0.5 trimmed mean is the median, which is always returned as a number.

TRUNC2 (N POWER)

Truncate `n' to a power of 2.

TRUNCATE-TO-FACTOR (N FACTOR)

Equivalent to (* factor (truncate n factor)). For example, `truncate-to-factor' of 65 and 60 is 60. Useful for converting to certain units, say when converting minutes to hours and minutes. See also `round-to-factor.'

TUKEY-SUMMARY (&REST ARGS)

TUKEY-SUMMARY (DATA &KEY START END KEY) Computes a Tukey five-number summary of the data. That is, it returns, in increasing order, the extremes and the quartiles: the minimum, the 1/4 quartile, the median, the 3/4 quartile, and the maximum.

VARIANCE (&REST ARGS)

VARIANCE (DATA &KEY START END KEY) Returns the variance of `data,' that is, the `sum-of-squares' divided by n-1. Signals `no-data' if there is no data. Signals `insufficient-data' if there is only one datum.

Z-TEST-ONE-SAMPLE (&REST ARGS)

Z-TEST-ONE-SAMPLE (DATA TAILS &OPTIONAL (H0-MEAN 0) (H0-STD-DEV 1) &KEY START END KEY) NIL

Undocumented

ENSURE-FLOAT (NUMBER)

MATRIX-TRACE (MATRIX)

PARTIALS-FROM-PARENTS (FROM TO PARENTS-LIST)

SQUARE (X)

SUM-OF-ARRAY-ELEMENTS (ARRAY)

TRANSPOSE-MATRIX (MATRIX &OPTIONAL INTO-MATRIX &AUX DIM-1 DIM-2)

Private

ANOVA-ONE-WAY-GROUPS (DATA &OPTIONAL (SCHEFFE-TESTS-P T) CONFIDENCE-INTERVALS)

Performs a one-way analysis of variance (ANOVA) on the `data,' which should be a sequence of sequences, where each interior sequence is the data for a particular group. Furthermore, each sequence should consist entirely of numbers, and each should have at least 2 elements. The significance of the result indicates that the group means are not all equal; that is, at least two of the groups have significantly different means. If there were only two groups, this would be semantically equivalent to an unmatched, two-tailed t-test, so you can think of the one-way ANOVA as a multi-group, two-tailed t-test. This function returns five values: 1. an ANOVA table; 2. a list a group means; 3. either a Scheffe table or nil depending on `scheffe-tests-p'; 4. an alternate value for SST; and 5. a list of confidence intervals in the form `(,mean ,lower ,upper) for each group, if `confidence-intervals' is a number between zero and one, giving the kind of confidence interval, such as 0.9. The fourth value is only interesting if you think there are numerical accuracy problems; it should be approximately equal to the SST value in the ANOVA table. This function differs from `anova-one-way-variables' only in its input representation. See the manual for more information.

ANOVA-ONE-WAY-VARIABLES-INTERNAL (IV DV &OPTIONAL (SCHEFFE-TESTS-P T) CONFIDENCE-INTERVALS)

See ANOVA-ONE-WAY-VARIABLES

ANOVA-TWO-WAY-GROUPS (DATA-ARRAY)

Calculates the analysis of variance when there are two factors that may affect the dependent variable. Because the input is represented as an array, we can refer to these two factors as the row-effect and the column effect. Unlike the one-way ANOVA, there are mathematical difficulties with the two-way ANOVA if there are unequal cell sizes; therefore, we require all cells to be the same size, and so the input is a three-dimensional array. The result of the analysis is an anova-table, as described in the manual. This function differs from `anova-two-way-variables' only in its input representation. See the manual for further discussion of analysis of variance.

ANOVA-TWO-WAY-VARIABLES-INTERNAL (DV IV1 IV2)

See ANOVA-TWO-WAY-VARIABLES

ANOVA-TWO-WAY-VARIABLES-UNEQUAL-CELL-SIZES-INTERNAL (IV1 IV2 DV)

See ANOVA-TWO-WAY-VARIABLES-UNEQUAL-CELL-SIZES

AUTOCORRELATION-INTERNAL (SAMPLE MAX-LAG &OPTIONAL (MIN-LAG 0))

See AUTOCORRELATION

CHI-SQUARE-2X2 (V1 V2)

Performs a chi-square test for independence of the two variables, `v1' and `v2.' These should be categorial variables with only two values; the function will construct a 2x2 contingency table by counting the number of occurrences of each combination of the variables. See the manual for more details.

CHI-SQUARE-2X2-COUNTS (A B C D &OPTIONAL (YATES T))

Runs a chi-square test for association on a simple 2 x 2 table. If `yates' is nil, the correction for continuity is not done; default is t. Returns the chi-square statistic and the significance of the value.

CHI-SQUARE-RXC (V1 V2)

Performs a chi-square test for independence of the two variables, `v1' and `v2.' These should be categorial variables; the function will construct a contingency table by counting the number of occurrences of each combination of the variables. See the manual for more details.

CHI-SQUARE-RXC-COUNTS (CONTINGENCY-TABLE)

Calculates the chi-square statistic and corresponding p-value for the given contingency table. The result says whether the row factor is independent of the column factor. Does not apply Yate's correction.

CONFIDENCE-INTERVAL-PROPORTION-INTERNAL (X N CONFIDENCE)

See CONFIDENCE-INTERVAL-PROPORTION

CONFIDENCE-INTERVAL-T-INTERNAL (DATA CONFIDENCE)

See CONFIDENCE-INTERVAL-T

CONFIDENCE-INTERVAL-Z-INTERNAL (DATA CONFIDENCE)

See CONFIDENCE-INTERVAL-Z

CONFIDENCE-INTERVAL-Z-SUMMARIES (MEAN STANDARD-ERROR CONFIDENCE)

This function is just like `confidence-interval-z,' except that instead of its arguments being the actual data, it takes the following summary statistics: `mean', a point estimator of the mean of some normally distributed population; and the `standard-error' of the estimator, that is, the estimated standard deviation of the normal population. `Confidence' should be a number between 0 and 1, exclusive.

CORRELATION-INTERNAL (SAMPLE1 SAMPLE2 &REST ARGS &KEY START1 END1 START2 END2)

See CORRELATION

COVARIANCE-INTERNAL (SAMPLE1 SAMPLE2 &REST ARGS &KEY START1 END1 START2 END2)

See COVARIANCE

CROSS-CORRELATION-INTERNAL (SEQUENCE1 SEQUENCE2 MAX-LAG &OPTIONAL (MIN-LAG 0))

See CROSS-CORRELATION

D-TEST-INTERNAL (SAMPLE-1 SAMPLE-2 TAILS &KEY (TIMES 1000) (H0MEAN 0))

See D-TEST

DATA-LENGTH-INTERNAL (DATA &KEY START END KEY)

See DATA-LENGTH

DIFFERENCE-LIST (NUMBER-LIST)

Takes a sequence of numbers and returns a sequence of differences from the mean. Formula: xi = Xi - Mean (X).

FIND-CRITICAL-VALUE (P-FUNCTION P-VALUE &OPTIONAL (X-TOLERANCE 1.e-5) (Y-TOLERANCE 1.e-5))

Returns the critical value of some statistic. The function `p-function' should be a unary function mapping statistics---x values---to their significance---p values. The function will find the value of x such that the p-value is `p-value.' The function works by binary search. A secant method might be better, but this seems to be acceptably fast. Only positive values of x are considered, and `p-function' should be monotonically decreasing from its value at x=0. The binary search ends when either the function value is within `y-tolerance' of `p-value' or the size of the search region shrinks to less than `x-tolerance.'

G-TEST (CONTINGENCY-TABLE &OPTIONAL EXPECTED-VALUE-MATRIX (ERROR-P T))

Calculates the G-test for a contingency table. The formula for the G-test statistic is 2 * sum[f_ij log [f_ij/f-hat_ij]] where f_ij is the ith by jth cell in the table and f-hat_ij is the expected value of that cell. If an expected-value-matrix is supplied, it must be the same size as table and it is used for expected values, in which case the G-test is a test of goodness-of-fit. If the expected value matrix is unsupplied, it is calculated from using the formula e_ij = [f_i* * f_*j] / f_** where f_i*, f_*j and f_** are the row, column and grand totals respectively. In this case, the G-test is a test of independence. The degrees of freedom is the same as for the chi-square statistic and the significance is obtained by comparing

INNER-PRODUCT (SAMPLE1 SAMPLE2 &KEY START1 END1 START2 END2)

Returns the inner product of the two samples, which should be sequences of numbers. The inner product, also called the dot product or vector product, is the sum of the pairwise multiplication of the numbers. Stops when either sample runs out; it doesn't check that they have the same length.

INTERQUARTILE-RANGE-INTERNAL (DATA &REST STANDARD-ARGS)

See INTERQUARTILE-RANGE

INVERT-MATRIX (MATRIX &OPTIONAL INTO-MATRIX)

If matrix is singular returns nil, else returns its inverse. If into-matrix is supplied, inverse is returned in it, otherwise a new array is created.

INVERT-MATRIX-ITERATE (MATRIX &OPTIONAL INTO-MATRIX)

If matrix is singular returns nil, else returns the inverse of matrix. Uses iterative improvement until no further improvement is possible.

MAKE-3D-TABLE (DV IV1 IV2)

Collects the `dv' values for each unique combination of an element of `v1' and an element of `v2.' Returns a three-dimensional table of dv values.

MAKE-CONTINGENCY-TABLE (V1 V2)

Counts each unique combination of an element of `v1' and an element of `v2.' Returns a two-dimensional table of integers.

MATRIX-ADDITION (&REST ARGS)

MATRIX-NORM (MATRIX)

Returns the norm of matrix. The norm is the maximum over the rows of the sum of the abs of the columns.

MATRIX-PLUS-MATRIX (MAT1 MAT2)

Adds two matrices together

MATRIX-PLUS-SCALAR (MATRIX SCALAR)

Add a scalar value to a matrix

MATRIX-TIMES-MATRIX (MAT1 MAT2)

Multiplies two matrices together

MATRIX-TIMES-SCALAR (MATRIX SCALAR)

Multiply a matrix by a scalar value

MATRIX-TIMES-SCALAR! (MATRIX SCALAR)

Multiply a matrix by a scalar value

MAXIMUM-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See MAXIMUM

MEAN-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See MEAN

MEDIAN-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See MEDIAN

MINIMUM-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See MINIMUM

MODE-FOR-CONTINUOUS-DATA (DATA &REST STANDARD-ARGS &KEY START END KEY WINDOW)

Returns the most frequent element of `data,' which should be a sequence. The algorithm involves sorting, and so the data must be numbers or the `key' function must produce numbers. Consider `sxhash' if no better function is available. Also returns the number of occurrences of the mode. If there is more than one mode, this returns the first mode, as determined by the sorting of the numbers. Keep in mind that if the data has multiple runs of like values that are bigger than the window size (currently defaults to 10% of the size of the data) this function will blindly pick the first one. If this is the case you probabaly should be calling `mode' instead of this function.

MODE-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See MODE

MULTIPLE-MODES-INTERNAL (DATA K &REST STANDARD-ARGS &KEY START END KEY)

See MULTIPLE-MODES

MULTIPLY-MATRICES (MATRIX-1 MATRIX-2 &OPTIONAL MATRIX-3 &AUX SAVED-MATRIX-3)

Multiply matrices MATRIX-1 and MATRIX-2, storing into MATRIX-3 if supplied. If MATRIX-3 is not supplied, then a new (ART-Q type) array is returned, else MATRIX-3 must have exactly the right dimensions for holding the result of the multiplication. Both MATRIX-1 and MATRIX-2 must be either one- or two-diimensional. The first dimension of MATRIX-2 must equal the second dimension of MATRIX-1, unless MATRIX-1 is one-dimensional, when the first dimensions must match (thus allowing multiplications of the form VECTOR x MATRIX)

PYTHAG-DF (A B)

Computes square root of a*a + b*b without destructive overflow or underflow.

PYTHAG-SF (A B)

Computes square root of a*a + b*b without destructive overflow or underflow.

QUANTILE-INTERNAL (DATA Q &REST STANDARD-ARGS &KEY START END KEY)

See QUANTILE

RANGE-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See RANGE

REDUCE-MATRIX (MAT)

Uses the Gauss-Jordan reduction method to reduce a matrix.

REMOVE-&REST (LIST)

Removes the '&rest arg' part from a lambda-list (strictly for documentation purposes.

SCALAR-MATRIX-MULTIPLY (SCALAR MATRIX)

Multiplies a matrix by a scalar value in the form M[i,j] = s*M[i,j].

SINGULAR-VALUE-DECOMPOSITION (MATRIX)

Returns as three values the U W and V of singular value decomposition. If you have already consed up these matrices, you should call `svdcmp-sf' or `svdcmp-df' directly. The input matrix is preserved.

SKEWNESS-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See SKEWNESS

STANDARD-DEVIATION-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See STANDARD-DEVIATION

STATISTICAL-SUMMARY-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See STATISTICAL-SUMMARY

SUM-LIST (NUMBER-LIST)

Takes a sequence of numbers and returns their sum. Formula: Sum(X).

SUM-OF-SQUARES (DATA &REST STANDARD-ARGS &KEY START END KEY)

Returns the sum of squared distances from the mean of `data'. Signals `no-data' if there is no data.

SVBKSB-DF (U W V M N B X &OPTIONAL (TMP (MAKE-ARRAY N ELEMENT-TYPE 'DOUBLE-FLOAT)))

Solves A X = B for a vector `X,' where A is specified by the mxn array U, `n' vector W, and nxn matrix V as returned by svdcmp. `m' and `n' are the dimensions of `A,' and will be equal for square matrices. `B' is the 1xm input vector for the right-hand side. `X' is the 1xn output solution vector. All arrays are of double-floats. No input quantities are destroyed, so the routine may be called sequentially with different B's. See the discussion in Numerical Recipes in C, section 2.6. This routine assumes that near zero singular values have already been zeroed. It returns no values, storing the result in `X.' It does use some auxiliary storage, which can be passed in as `tmp,' a double-float array of length `n,' if you want to avoid consing.

SVBKSB-SF (U W V M N B X &OPTIONAL (TMP (MAKE-ARRAY N ELEMENT-TYPE 'SINGLE-FLOAT)))

Solves A X = B for a vector `X,' where A is specified by the mxn array U, `n' vector W, and nxn matrix V as returned by svdcmp. `m' and `n' are the dimensions of `A,' and will be equal for square matrices. `B' is the 1xm input vector for the right-hand side. `X' is the 1xn output solution vector. All arrays are of single-floats. No input quantities are destroyed, so the routine may be called sequentially with different B's. See the discussion in Numerical Recipes in C, section 2.6. This routine assumes that near zero singular values have already been zeroed. It returns no values, storing the result in `X.' It does use some auxiliary storage, which can be passed in as `tmp,' a single-float array of length `n,' if you want to avoid consing.

SVD-BACK-SUBSTITUTE (U W V B)

Returns the solution vector to the Ax=b, where A has been decomposed into `u,' `w' and `v' by `singular-value-decomposition.' This function is just a minor wrapping of `svbksb-sf' and `svbksb-df.'

SVD-INVERSE-FAST-DF (U W V &OPTIONAL (A-1 (MAKE-ARRAY (LIST (LENGTH W) (LENGTH W)) ELEMENT-TYPE 'DOUBLE-FLOAT)) (TMP (MAKE-ARRAY (LENGTH W) ELEMENT-TYPE 'DOUBLE-FLOAT)))

Computes the inverse of a matrix that has been decomposed into `u,' `w' and `v' by singular value decomposition. It assumes the ``small'' elements of `w' have already been zeroed. It computes the inverse by taking advantage of the known zeros in the full 2-dimensional `w' matrix. It uses the backsubstitution algorithm, only with the B vectors fixed at the columns of the identity matrix, which lets us take advantage of its zeros. It's about twice as fast as the slow version and conses a lot less. Note that if you are computing the inverse merely to solve one or more systems of equations, you are better off using the decomposition and backsubstitution routines directly.

SVD-INVERSE-FAST-SF (U W V &OPTIONAL (A-1 (MAKE-ARRAY (LIST (LENGTH W) (LENGTH W)) ELEMENT-TYPE 'SINGLE-FLOAT)) (TMP (MAKE-ARRAY (LENGTH W) ELEMENT-TYPE 'SINGLE-FLOAT)))

Computes the inverse of a matrix that has been decomposed into `u,' `w' and `v' by singular value decomposition. It assumes the ``small'' elements of `w' have already been zeroed. It computes the inverse by taking advantage of the known zeros in the full 2-dimensional `w' matrix. It uses the backsubstitution algorithm, only with the B vectors fixed at the columns of the identity matrix, which lets us take advantage of its zeros. It's about twice as fast as the slow version and conses a lot less. Note that if you are computing the inverse merely to solve one or more systems of equations, you are better off using the decomposition and backsubstitution routines directly.

SVD-INVERSE-SLOW-DF (U W V &OPTIONAL (A-1 (MAKE-ARRAY (LIST (LENGTH W) (LENGTH W)) ELEMENT-TYPE 'DOUBLE-FLOAT)))

Computes the inverse of a matrix that has been decomposed into `u,' `w' and `v' by singular value decomposition. It assumes the ``small'' elements of `w' have already been zeroed. It computes the inverse by constructing a diagonal matrix `w2' from `w' (which is just a vector of the diagonal elements, and then explicitly multiplying u^t w2 and v. Note that if you are computing the inverse merely to solve one or more systems of equations, you are better off using the decomposition and backsubstitution routines directly.

SVD-INVERSE-SLOW-SF (U W V &OPTIONAL (A-1 (MAKE-ARRAY (LIST (LENGTH W) (LENGTH W)) ELEMENT-TYPE 'SINGLE-FLOAT)))

Computes the inverse of a matrix that has been decomposed into `u,' `w' and `v' by singular value decomposition. It assumes the ``small'' elements of `w' have already been zeroed. It computes the inverse by constructing a diagonal matrix `w2' from `w' (which is just a vector of the diagonal elements, and then explicitly multiplying u^t w2 and v. Note that if you are computing the inverse merely to solve one or more systems of equations, you are better off using the decomposition and backsubstitution routines directly.

SVD-MATRIX-INVERSE (A &OPTIONAL (SINGULARITY-THRESHOLD 1.d-10))

Use singular value decomposition to compute the inverse of `A.' If an exact inverse is not possible, then zero the otherwise infinite inverted singular value and compute the inverse. The inverse is returned; `A' is not destroyed. If you're using this to solve several systems of equations, you're better off computing the singular value decomposition and using it several times, because this function computes it anew each time.

SVD-SOLVE-LINEAR-SYSTEM (MATRIX B-VECTOR &OPTIONAL (REPORT? T) (THRESHOLD 1.e-6))

Returns solution of linear system matrix * solution = b-vector. Employs the singular value decomposition method. See the discussion in Numerical Recipes in C, section 2.6, especially as to the semantics of `threshold.'

SVD-ZERO (W &OPTIONAL (THRESHOLD 1.e-6) (REPORT? T))

If the relative magnitude of an element in `w' compared to the largest element is less than `threshold,' then zero that element. Returns a list of indices of the zeroed elements. This function is just a convenient wrapper for `svzero-sf' and `svzero-df.'

SVDCMP-DF (A M N W V &OPTIONAL (RV1 (MAKE-ARRAY N ELEMENT-TYPE 'DOUBLE-FLOAT)))

Given an `m'x`n' matrix `A,' this routine computes its singular value decomposition, A = U W V^T. The matrix U replaces `A' on output. The diagonal matrix of singular values W is output as a vector `W' of length `n.' The matrix `V' -- not the transpose V^T -- is output as an `n'x`n' matrix `V.' The row dimension `m' must be greater or equal to `n'; if it is smaller, then `A' should be filled up to square with zero rows. See the discussion in Numerical Recipes in C, section 2.6. This routine returns no values, storing the results in `A,' `W,' and `V.' It does use some auxiliary storage, which can be passed in as `rv1,' a double-float array of length `n,' if you want to avoid consing.

SVDCMP-SF (A M N W V &OPTIONAL (RV1 (MAKE-ARRAY N ELEMENT-TYPE 'SINGLE-FLOAT)))

Given an `m'x`n' matrix `A,' this routine computes its singular value decomposition, A = U W V^T. The matrix U replaces `A' on output. The diagonal matrix of singular values W is output as a vector `W' of length `n.' The matrix `V' -- not the transpose V^T -- is output as an `n'x`n' matrix `V.' The row dimension `m' must be greater or equal to `n'; if it is smaller, then `A' should be filled up to square with zero rows. See the discussion in Numerical Recipes in C, section 2.6. This routine returns no values, storing the results in `A,' `W,' and `V.' It does use some auxiliary storage, which can be passed in as `rv1,' a single-float array of length `n,' if you want to avoid consing. All input arrays should be of single-floats.

SVDVAR (V W &OPTIONAL CVM)

Given `v' and `w' as computed by singular value decomposition, computes the covariance matrix among the predictors. Based on Numerical Recipes in C, section 15.4, algorithm `svdvar.' The covariance matrix is returned. It can be supplied as the third argument.

SVZERO-DF (W N THRESHOLD &OPTIONAL (REPORT? T))

If the relative magnitude of an element in `w' compared to the largest element is less than `threshold,' then zero that element. If `report?' is true, the indices of zeroed elements are printed. Returns a list of the indices of zeroed elements. This routine uses double-floats.

SVZERO-SF (W N THRESHOLD &OPTIONAL (REPORT? T))

If the relative magnitude of an element in `w' compared to the largest element is less than `threshold,' then zero that element. If `report?' is true, the indices of zeroed elements are printed. Returns a list of indices of the zeroed elements. This routine uses single-floats.

T-TEST-INTERNAL (SAMPLE-1 SAMPLE-2 &OPTIONAL (TAILS BOTH) (H0MEAN 0))

See T-TEST

T-TEST-MATCHED-INTERNAL (SAMPLE1 SAMPLE2 &OPTIONAL (TAILS BOTH))

See T-TEST-MATCHED

T-TEST-ONE-SAMPLE-INTERNAL (DATA TAILS &OPTIONAL (H0-MEAN 0) &REST STANDARD-ARGS &KEY START END KEY)

See T-TEST-ONE-SAMPLE

TRIMMED-MEAN-INTERNAL (DATA PERCENTAGE &REST STANDARD-ARGS &KEY START END KEY)

See TRIMMED-MEAN

TUKEY-SUMMARY-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See TUKEY-SUMMARY

VARIANCE-INTERNAL (DATA &REST STANDARD-ARGS &KEY START END KEY)

See VARIANCE

Z-TEST-ONE-SAMPLE-INTERNAL (DATA TAILS &OPTIONAL (H0-MEAN 0) (H0-STD-DEV 1) &REST STANDARD-ARGS &KEY START END KEY)

See Z-TEST-ONE-SAMPLE

Undocumented

1-OR-2D-ARRAYP (ARRAY)

CONFIDENCE-INTERVAL-INTERNAL

DATA-CONTINUOUS-P (SEQUENCE)

ERROR-FUNCTION-COMPLEMENT-SHORT-1 (Y Z)

ERROR-FUNCTION-COMPLEMENT-SHORT-2 (Y)

FILL-2D-ARRAY (ARRAY LIST)

LIST-2D-ARRAY (ARRAY)

SIGNIFICANCE-INTERNAL

SMART-MODE (SEQUENCE &REST ARGS)

T-SIGNIFICANCE-INTERNAL

MACRO

Public

UNDERFLOW-GOES-TO-ZERO (&BODY BODY)

Protects against floating point underflow errors and sets the value to 0.0 instead.

WITH-TEMP-TABLE ((TEMP) &BODY FORMS)

Binds `temp' to a hash table.

WITH-TEMP-VECTOR ((TEMP MIN-SIZE) &BODY FORMS)

Binds `temp' to a vector of length at least `min-size.' It's a vector of pointers and has a fill-pointer, initialized to `min-size.'

Private

CHECK-TYPE-OF-ARG (ARG-NAME PREDICATE TYPE-STRING &OPTIONAL ERROR-TYPE-NAME)

Generate error if the value of ARG-NAME doesn't satisfy PREDICATE. PREDICATE is a function name (a symbol) or an expression to compute. TYPE-STRING is a string to use in the error message, such as "a list". ERROR-TYPE-NAME is a keyword that tells condition handlers what type was desired.

DEFINE-STATISTIC (NAME &OPTIONAL SUPERCLASSES SLOTS VALUES ARGUMENT-TYPES LAMBDA-LIST &BODY BODY)

In clasp, statistical objects have two parts, a class which stores the various parts of the object and a computing function which computes the value of the object from arguments. The define-statistic macro allows the definition of new statistical types. The define-statistic macro must be provided with all the information necessary to create a statistical object, that is, everything required to create a new class, everything required to create a computing function and some information to connect the two. This last part consists of a list of arguments and their types and a list which determines how the values of a statistical function should be used to fill the slots of a statistical object. When define-statistic is invoked, two things happen, first a class is defined which is a subclass of 'statistic and any other named `superclasses'. Second, a pair of functions is defined. `clasp-statistics::name' is an internal function which has the supplied `body' and `lambda-list' and must return as many values as there are slots in the class `name'. The function `name' is also defined, it is basically a wrapper function which converts its arguments to those which are accepted by `body' and then calls `clasp-statistics::name'. The parameter clasp:*create-statistical-objects* determines whether the wrapper function packages the values returned by the intern function into a statistical object or just returns them as multiple values. The `argument-types' argument must be an alist in which the keys are the names of arguments as given in `lambda-list' and the values are lisp types which those arguments will be converted to before calling the internal statistical function. The primary purpose of this is to allow for coersion of clasp variables to sequences, but any coercion which is allowed by lisp is acceptable. The `values' argument is intended to allow the programmer to specify which slots in the statistical object are filled by which of the values returned by the statistical function. By default, the order of the values is assumed to be direct slots in order of specification, inherited slots in order of specification in the superclasses which are also statistics.

Undocumented

AREF1 (A I)

AREF11 (A I J)

SIGN-DF (A B)

SIGN-SF (A B)

START/END (CALL-FORM START-N END-N)

WITH-ROUTINE-ERROR-HANDLING (&BODY BODY)

GENERIC-FUNCTION

Public

DOT-PRODUCT (SEQUENCE-1 SEQUENCE-2)

http://en.wikipedia.org/wiki/Dot_product

Undocumented

CONVERT (OBJECT TYPE)

CROSS-PRODUCT (NUMBER-LIST-1 NUMBER-LIST-2)

Private

Undocumented

COMPOSITE-STATISTIC-P (IT)

MAKE-STATISTIC (TYPE &REST ARGS)

SIMPLE-STATISTIC-P (IT)

STATISTICP (IT)

VARIABLE

Private

*TEMPORARY-TABLE*

A temporary table. This avoids consing.

*TEMPORARY-VECTOR*

A temporary vector for use by statistical functions such as `quantile,' which uses it for sorting data. This avoids consing or rearranging the user's data.

Undocumented

*CONTINOUS-DATA-WINDOW-DIVISOR*

*CONTINUOUS-VARIABLE-UNIQUENESS-FACTOR*

*CREATE-STATISTICAL-OBJECTS*

*GAUSSIAN-CDF-SIGNALS-ZERO-STANDARD-DEVIATION-ERROR*

*WAY-TOO-BIG-CONTINGENCY-TABLE-DIMENSION*

CLASS

Public

Undocumented

ANOVA-ONE-WAY-VARIABLES (&REST ARGS)

ANOVA-TWO-WAY-VARIABLES (&REST ARGS)

ANOVA-TWO-WAY-VARIABLES-UNEQUAL-CELL-SIZES (&REST ARGS)

AUTOCORRELATION (&REST ARGS)

CONFIDENCE-INTERVAL (&REST ARGS)

CONFIDENCE-INTERVAL-PROPORTION (&REST ARGS)

CONFIDENCE-INTERVAL-T (&REST ARGS)

CONFIDENCE-INTERVAL-Z (&REST ARGS)

CORRELATION (&REST ARGS)

COVARIANCE (&REST ARGS)

CROSS-CORRELATION (&REST ARGS)

D-TEST (&REST ARGS)

DATA-LENGTH (&REST ARGS)

INTERQUARTILE-RANGE (&REST ARGS)

MAXIMUM (&REST ARGS)

MEAN (&REST ARGS)

MEDIAN (&REST ARGS)

MINIMUM (&REST ARGS)

MODE (&REST ARGS)

MULTIPLE-MODES (&REST ARGS)

QUANTILE (&REST ARGS)

RANGE (&REST ARGS)

SIGNIFICANCE (&REST ARGS)

SKEWNESS (&REST ARGS)

STANDARD-DEVIATION (&REST ARGS)

STATISTICAL-SUMMARY (&REST ARGS)

T-SIGNIFICANCE (&REST ARGS)

T-TEST (&REST ARGS)

T-TEST-MATCHED (&REST ARGS)

T-TEST-ONE-SAMPLE (&REST ARGS)

TRIMMED-MEAN (&REST ARGS)

TUKEY-SUMMARY (&REST ARGS)

VARIANCE (&REST ARGS)

Z-TEST-ONE-SAMPLE (&REST ARGS)

Private

Undocumented

COMPOSITE-STATISTIC

DATA

SIMPLE-STATISTIC

STATISTIC

CONDITION

Private

Undocumented

DATA-ERROR

ENORMOUS-CONTINGENCY-TABLE

INSUFFICIENT-DATA

NO-DATA

NOT-BINARY-VARIABLES

UNMATCHED-SEQUENCES

ZERO-STANDARD-DEVIATION

ZERO-VARIANCE

CONSTANT

Public

+E+

An approximation of the constant e (named for Euler!).

2FPI

The constant 2*pi, in single-float format. Using this constant avoid run-time double-float contagion.

FPI

The constant pi, in single-float format. Using this constant avoid run-time double-float contagion.

Undocumented

+0DEGREES+

+10DEGREES+

+120DEGREES+

+135DEGREES+

+150DEGREES+

+15DEGREES+

+180DEGREES+

+210DEGREES+

+225DEGREES+

+240DEGREES+

+270DEGREES+

+300DEGREES+

+30DEGREES+

+315DEGREES+

+330DEGREES+

+360DEGREES+

+45DEGREES+

+5DEGREES+

+60DEGREES+

+90DEGREES+

Private

Undocumented

+LOG-PI+

+SQRT-PI+