procD.lm: Procrustes ANOVA/regression for shape data

Description

Function performs Procrustes ANOVA with permutation procedures to assess statistical hypotheses describing patterns of shape variation and covariation for a set of Procrustes-aligned coordinates

Usage

procD.lm(f1, iter = 999, seed = NULL, RRPP = TRUE, int.first = FALSE,
  data = NULL, print.progress = TRUE, ...)

Arguments

A formula for the linear model (e.g., y~x1+x2)

iter

Number of iterations for significance testing

seed

An optional argument for setting the seed for random permutations of the resampling procedure. If left NULL (the default), the exact same P-values will be found for repeated runs of the analysis (with the same number of iterations). If seed = "random", a random seed will be used, and P-values will vary. One can also specify an integer for specific seed values, which might be of interest for advanced users.

RRPP

A logical value indicating whether residual randomization should be used for significance testing

int.first

A logical value to indicate if interactions of first main effects should precede subsequent main effects

data

A data frame for the function environment, see geomorph.data.frame

print.progress

A logical value to indicate whether a progress bar should be printed to the screen. This is helpful for long-running analyses.

...

Arguments passed on to procD.fit (typically associated with the lm function).

Value

An object of class "procD.lm" is a list containing the following

aov.table

An analysis of variance table; the same as the summary.

call

The matched call.

coefficients

A vector or matrix of linear model coefficients.

The response data, in matrix form.

The model matrix.

The QR decompositions of the model matrix.

fitted

The fitted values.

residuals

The residuals (observed responses - fitted responses).

weights

The weights used in weighted least-squares fitting. If no weights are used, NULL is returned.

Terms

The results of the terms function applied to the model matrix

term.labels

The terms used in constructing the aov.table.

The sums of squares for each term, model residuals, and the total.

The type of sums of squares. One of type I or type III.

The degrees of freedom for each SS.

The coefficient of determination for each model term.

The F values for each model term.

permutations

The number of random permutations (including observed) used.

random.SS

A matrix or vector of random SS found via the resampling procedure used.

perm.method

A value indicating whether "Raw" values were shuffled or "RRPP" performed.

Details

The function quantifies the relative amount of shape variation attributable to one or more factors in a linear model and estimates the probability of this variation ("significance") for a null model, via distributions generated from resampling permutations. Data input is specified by a formula (e.g., y~X), where 'y' specifies the response variables (shape data), and 'X' contains one or more independent variables (discrete or continuous). The response matrix 'y' can be either in the form of a two-dimensional data matrix of dimension (n x [p x k]), or a 3D array (p x n x k). It is assumed that -if the data based on landmark coordinates - the landmarks have previously been aligned using Generalized Procrustes Analysis (GPA) [e.g., with gpagen]. The names specified for the independent (x) variables in the formula represent one or more vectors containing continuous data or factors. It is assumed that the order of the specimens in the shape matrix matches the order of values in the independent variables. Linear model fits (using the lm function) can also be input in place of a formula. Arguments for lm can also be passed on via this function.

The function two.d.array can be used to obtain a two-dimensional data matrix from a 3D array of landmark coordinates; however this step is no longer necessary, as procD.lm can receive 3D arrays as dependent variables. It is also recommended that geomorph.data.frame is used to create and input a data frame. This will reduce problems caused by conflicts between the global and function environments. In the absence of a specified data frame, procD.lm will attempt to coerce input data into a data frame, but success is not guaranteed.

The function performs statistical assessment of the terms in the model using Procrustes distances among specimens, rather than explained covariance matrices among variables. With this approach, the sum-of-squared Procrustes distances are used as a measure of SS (see Goodall 1991). The observed SS are evaluated through permutation. In morphometrics this approach is known as a Procrustes ANOVA (Goodall 1991), which is equivalent to distance-based anova designs (Anderson 2001). Two possible resampling procedures are provided. First, if RRPP=FALSE, the rows of the matrix of shape variables are randomized relative to the design matrix. This is analogous to a 'full' randomization. Second, if RRPP=TRUE, a residual randomization permutation procedure is utilized (Collyer et al. 2015). Here, residual shape values from a reduced model are obtained, and are randomized with respect to the linear model under consideration. These are then added to predicted values from the remaining effects to obtain pseudo-values from which SS are calculated. NOTE: for single-factor designs, the two approaches are identical. However, when evaluating factorial models it has been shown that RRPP attains higher statistical power and thus has greater ability to identify patterns in data should they be present (see Anderson and terBraak 2003). Effect-sizes (Z-scores) are computed as standard deviates of the SS sampling distributions generated, which might be more intuitive for P-values than F-values (see Collyer et al. 2015). In the case that multiple factor or factor-covariate interactions are used in the model formula, one can specify whether all main effects should be added to the model first, or interactions should precede subsequent main effects (i.e., Y ~ a + b + c + a:b + ..., or Y ~ a + b + a:b + c + ..., respectively.)

The generic functions, print, summary, and plot all work with procD.lm. The generic function, plot, produces diagnostic plots for Procrustes residuals of the linear fit.

References

Anderson MJ. 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecology 26: 32-46.

Anderson MJ. and C.J.F. terBraak. 2003. Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computation and Simulation 73: 85-113.

Collyer, M.L., D.J. Sekora, and D.C. Adams. 2015. A method for analysis of phenotypic change for phenotypes described by high-dimensional data. Heredity. 115:357-365.

Goodall, C. R. 1991. Procrustes methods in the statistical analysis of shape. Journal of the Royal Statistical Society B 53:285-339.

Examples

Run this code

### MANOVA example for Goodall's F test (multivariate shape vs. factors)
data(plethodon) 
Y.gpa <- gpagen(plethodon$land)    #GPA-alignment  
gdf <- geomorph.data.frame(shape = Y.gpa$coords, 
site = plethodon$site, species = plethodon$species) # geomorph data frame

procD.lm(shape ~ species * site, data = gdf, iter = 999, RRPP = FALSE) # randomize raw values
procD.lm(shape ~ species * site, data = gdf, iter = 999, RRPP = TRUE) # randomize residuals

### Regression example
data(ratland)
rat.gpa<-gpagen(ratland)         #GPA-alignment
gdf <- geomorph.data.frame(rat.gpa) # geomorph data frame is easy without additional input

procD.lm(coords ~ Csize, data = gdf, iter = 999, RRPP = FALSE) # randomize raw values
procD.lm(coords ~ Csize, data = gdf, iter = 999, RRPP = TRUE) # randomize raw values
# Outcomes should be exactly the same

### Extracting objects and plotting residuals
rat.anova <- procD.lm(coords ~ Csize, data = gdf, iter = 999, RRPP = TRUE)
summary(rat.anova)
plot(rat.anova) # diagnostic plots
plot(rat.anova, outliers = TRUE) # diagnostic plots, including plotOutliers
attributes(rat.anova)
rat.anova$fitted # just the fitted values

Run the code above in your browser using DataLab