Learn R Programming

lmPerm (version 2.1.0)

lmp: Fitting and testing linear models with permutation tests.

Description

lmp is lm modified to use permutation tests instead of normal theory tests. Like lm, it can be used to carry out regression, single stratum analysis of variance and analysis of covariance . Timing differences between lmp and lm are negligible.

Usage

lmp(formula, data, perm="Exact", seqs=FALSE, center=TRUE, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, ...)

Arguments

formula
a symbolic description of the model to be fit.
data
an optional data frame containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lmp is called.
perm
"Exact", "Prob", "SPR" will produce permutation probabilities. Anyting else, such as "", will produce F-test probabilites.
seqs
If TRUE, will calculate sequential SS. If FALSE, unique SS.
center
If TRUE will center numerical variables
subset
an optional vector specifying a subset of observations to be used in the fitting process.
weights
an optional vector of weights to be used in the fitting process. If specified, weighted least squares is used with weights weights (that is, minimizing sum(w*e^2)); otherwise ordinary least squares is used.
na.action
a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The “factory-fresh” default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.
method
the method to be used; for fitting, currently only method = "qr" is supported; method = "model.frame" returns the model frame (the same as with model = TRUE, see below).
model, x, y, qr
logicals. If TRUE the corresponding components of the fit (the model frame, the model matrix, the response, the QR decomposition) are returned.
singular.ok
logical. If FALSE (the default in S but not in R) a singular fit is an error.
contrasts
an optional list. See the contrasts.arg of model.matrix.default.
offset
this can be used to specify an a priori known component to be included in the linear predictor during fitting. An offset term can be included in the formula instead or as well, and if both are specified their sum is used.
...
additional arguments to be passed.

Value

The usual output from lm, with permutation p-values or F-test p-values. The p-values for the coefficients are of necessity, two-sided.

Additional parameters

These are the same as for aovp.

Details

The usual regression model EY=Xb is assumed. The vector b is divided into sources with dfi degrees of freedom for the ith source, and anova(lmp()) will produce an ANOVA table for these sources. Either permutation test p-values or the usual F-test p-values will be output. Polynomial model terms are collected into sources, so that Y~A+B+I(A^2) will contain two sources, one for A with 2 df, and one for B with 1 df. Sources for factors are treated as usual, and polynomial terms and factors may be mixed in one model. The function poly.formula may be used to create polynomial models, and the function multResp may be used to create a multi-response matrix for the lhs from variables in data.

One may also use summary(lm()) to obtain coefficient estimates and estimates of the permutation test p-values. The Exact method will permute the values exactly. The Prob and SPR methods will approximate the permutation distribution by randomly exchanging pairs of Y elements. The Exact method will be used by default when the number of observations is less than or equal to maxExact, otherwise Prob will be used.

Prob: Iterations terminate when the estimated standard error of the estimated proportion p is less than p*Ca. The iteration continues until all sources and coefficients meet this criterion or until maxIter is reached. See Anscome(1953) for the origin of the criterion.

SPR: This method uses sequential probability ratio tests to decide between the hypotheses p0 and p1 for a strength (alpha, beta) test. The test terminates upon the acceptance or rejection of p0 or if maxIter is reached. See Wald (1947). The power of the SPR is beta at p0 and increases to 1-beta at p1. Placing p0 and p1 close together makes the cut off sharp.

Exact: This method generates all permutations of Y. It will generally be found too time consuming for more than 10 or 11 observations, but note that aovp may be used to divide the data into small enough blocks for which exact permutation tests may be possible.

For Prob and SPR, one may set nCycle to unity to exchange all elements instead of just pairs at each iteration, but there seems to be no advantage to doing this unless the number of iterations is small -- say less than 100.

The SS will be calculated sequentially, just as lm() does; or they may be calculated uniquely, which means that the SS for each source is calculated conditionally on all other sources. This is SAS type III, which is also what drop1() produces, except that drop1() will not drop main effects when interactions are present. The parameter seqs may be used to override the default unique calculation behavior.

References

John Wiley & Sons, New York.

See Also

summary.lmp, aovp

Examples

Run this code
# 3x3 factorial with ordered factors, each is average of 12. 
# This is a saturated design with no df for error. The results tend to support 
# Cochran and Cox who used a guessed residual SS for their analysis. The design
# is balanced, so the sequential SS are the same as the unique SS. 
data(CC164)
summary(lmp(y ~ N * P, data = CC164, perm="")) # F-value output as if lm() was used.
summary(lmp(y ~ N * P, data = CC164,)) # Default, using "Exact" if possible.
summary(lmp(y ~ N * P, data = CC164, perm="SPR"))
anova(lmp(y ~ N * P, data = CC164))

# A two level factorial. The artificial data is N(0,1) with an effect of
# 1.5 added to factor X4. When the number of iterations are small, as in
# this case, using nCycle=1 is advantageous. 
X<-expand.grid(X1=1:2,X2=1:2,X3=1:2,X4=1:2)
X$Y<-c(0.99,1.34,0.88,1.94,0.63,0.29,-0.78,-0.89,0.43,-0.03,0.50,1.66,1.65,1.78,1.31,1.51)
summary(lmp(Y~(X1+X2+X3+X4)^2,X,"SP"))  # The prob method is used because "SP" is not recognized.
summary(lmp(Y~(X1+X2+X3+X4)^2,X,"SPR"))
summary(lmp(Y~(X1+X2+X3+X4)^2,X,"SPR",nCycle=1)) #An additional parameter being passed.

# A saturated design with 15 variables in 16 runs. The orginal analysis by Quinlan pooled the mean
# squares from the 7 smallest effcts and found many variables to be significant. Box, reanalyzed
# the data using half-normal plots and found only variables E and G to be important. The permutation
# analysis agrees with this conclusion.
data(Quinlan)
summary(lmp(SN~.,Quinlan))

# A design containing both a polynomial variable and a factor
data(simDesignPartNumeric)
anova(lmp(poly.formula(Y~quad(A,B)+C),simDesignPartNumeric))


Run the code above in your browser using DataLab