procD.lm: Procrustes ANOVA/regression for Procrustes shape variables

Description

Function performs Procrustes ANOVA with permutation procedures to assess statistical hypotheses describing patterns of shape variation and covariation for a set of Procrustes shape variables

Usage

procD.lm(
  f1,
  iter = 999,
  seed = NULL,
  RRPP = TRUE,
  SS.type = c("I", "II", "III"),
  effect.type = c("F", "cohenf", "SS", "MS", "Rsq"),
  int.first = FALSE,
  Cov = NULL,
  data = NULL,
  print.progress = TRUE,
  ...
)

Arguments

A formula for the linear model (e.g., y~x1+x2)

iter

Number of iterations for significance testing

seed

An optional argument for setting the seed for random permutations of the resampling procedure. If left NULL (the default), the exact same P-values will be found for repeated runs of the analysis (with the same number of iterations). If seed = "random", a random seed will be used, and P-values will vary. One can also specify an integer for specific seed values, which might be of interest for advanced users.

RRPP

A logical value indicating whether residual randomization should be used for significance testing

SS.type

SS.type A choice between type I (sequential), type II (hierarchical), or type III (marginal) sums of squares and cross-products computations.

effect.type

One of "F", "SS", or "cohen", to choose from which random distribution to estimate effect size. (The option, "cohen", is for Cohen's f-squared values. The default is "F". Values are log-transformed before z-score calculation to assure normally distributed data.)

int.first

A logical value to indicate if interactions of first main effects should precede subsequent main effects

Cov

An optional covariance matrix that can be used for generalized least squares estimates of coefficients and sums of squares and cross-products (see Adams and Collyer 2018).

data

A data frame for the function environment, see geomorph.data.frame

print.progress

A logical value to indicate whether a progress bar should be printed to the screen. This is helpful for long-running analyses.

...

Arguments passed on to lm.rrpp, like weights and offset.

Value

An object of class "procD.lm" is a list containing the following

aov.table

An analysis of variance table; the same as the summary.

call

The matched call.

coefficients

A vector or matrix of linear model coefficients.

The response data, in matrix form.

The model matrix.

The QR decompositions of the model matrix.

fitted

The fitted values.

residuals

The residuals (observed responses - fitted responses).

weights

The weights used in weighted least-squares fitting. If no weights are used, NULL is returned.

Terms

The results of the terms function applied to the model matrix

term.labels

The terms used in constructing the aov.table.

data

The data frame for the model.

The sums of squares for each term, model residuals, and the total.

SS.type

The type of sums of squares. One of type I or type III.

The degrees of freedom for each SS.

The coefficient of determination for each model term.

The F values for each model term.

permutations

The number of random permutations (including observed) used.

random.SS

A matrix of random SS found via the resampling procedure used.

random.F

A matrix or vector of random F values found via the resampling procedure used.

random.cohenf

A matrix or vector of random Cohen's f-squared values found via the resampling procedure used.

permutations

The number of random permutations (including observed) used.

effect.type

The distribution used to estimate effect-size.

perm.method

A value indicating whether "Raw" values were shuffled or "RRPP" performed.

gls

This prefix will be used if a covariance matrix is provided to indicate GLS computations.

Details

The function quantifies the relative amount of shape variation attributable to one or more factors in a linear model and estimates the probability of this variation ("significance") for a null model, via distributions generated from resampling permutations. Data input is specified by a formula (e.g., y~X), where 'y' specifies the response variables (Procrustes shape variables), and 'X' contains one or more independent variables (discrete or continuous). The response matrix 'y' can be either in the form of a two-dimensional data matrix of dimension (n x [p x k]), or a 3D array (p x n x k). It is assumed that -if the data based on landmark coordinates - the landmarks have previously been aligned using Generalized Procrustes Analysis (GPA) [e.g., with gpagen]. The names specified for the independent (x) variables in the formula represent one or more vectors containing continuous data or factors. It is assumed that the order of the specimens in the shape matrix matches the order of values in the independent variables. Linear model fits (using the lm function) can also be input in place of a formula. Arguments for lm can also be passed on via this function.

The function two.d.array can be used to obtain a two-dimensional data matrix from a 3D array of landmark coordinates; however this step is no longer necessary, as procD.lm can receive 3D arrays as dependent variables. It is also recommended that geomorph.data.frame is used to create and input a data frame. This will reduce problems caused by conflicts between the global and function environments. In the absence of a specified data frame, procD.lm will attempt to coerce input data into a data frame, but success is not guaranteed.

The function performs statistical assessment of the terms in the model using Procrustes distances among specimens, rather than explained covariance matrices among variables. With this approach, the sum-of-squared Procrustes distances are used as a measure of SS (see Goodall 1991). The observed SS are evaluated through permutation. In morphometrics this approach is known as a Procrustes ANOVA (Goodall 1991), which is equivalent to distance-based anova designs (Anderson 2001). Two possible resampling procedures are provided. First, if RRPP=FALSE, the rows of the matrix of shape variables are randomized relative to the design matrix. This is analogous to a 'full' randomization. Second, if RRPP=TRUE, a residual randomization permutation procedure is utilized (Collyer et al. 2015). Here, residual shape values from a reduced model are obtained, and are randomized with respect to the linear model under consideration. These are then added to predicted values from the remaining effects to obtain pseudo-values from which SS are calculated. NOTE: for single-factor designs, the two approaches are identical. However, when evaluating factorial models it has been shown that RRPP attains higher statistical power and thus has greater ability to identify patterns in data should they be present (see Anderson and terBraak 2003).

Effect-sizes (Z scores) are computed as standard deviates of either the SS, F, or Cohen's f-squared sampling distributions generated, which might be more intuitive for P-values than F-values (see Collyer et al. 2015). Values from these distributions are log-transformed prior to effect size estimation, to assure normally distributed data. The SS type will influence how Cohen's f-squared values are calculated. Cohen's f-squared values are based on partial eta-squared values that can be calculated sequentially or marginally, as with SS.

In the case that multiple factor or factor-covariate interactions are used in the model formula, one can specify whether all main effects should be added to the model first, or interactions should precede subsequent main effects (i.e., Y ~ a + b + c + a:b + ..., or Y ~ a + b + a:b + c + ..., respectively.)

The generic functions, print, summary, and plot all work with procD.lm. The generic function, plot has several options for plotting, using plot.procD.lm. Diagnostics plots, principal component plots (rotated to first PC of covariance matrix of fitted values), and regression plots can be performed. One must provide a linear predictor, and can choose among predicted values (PredLine) or regression scores (RegScore). In these plotting options, the predictor does not need to be size, and fitted values and residuals from the procD.lm fit are used rather than mean-centered values.

Notes for geomorph 3.1.0 and subsequent versions

The procD.lm function is now a wrapper for the lm.rrpp function in the RRPP package. Examples below illustrate how to utilize RRPP functions along with geomorph functions for procD.lm objects, increasing the breadth of possible downstream analyses.

An important update in version 3.1.0 is that advanced.procD.lm and nested.update have been deprecated. The examples emphasize how pairwise comparisons can now be accomplished with pairwise and ANOVA updates for nested factors can be made with the anova.lm.rrpp, utilizing the error argument. These functions work on procD.lm objects that have already been created.

Notes for geomorph 3.0.6 and subsequent versions

Compared to previous versions, GLS computations in random permutations are now possible in procD.lm. One should use RRPP = TRUE if a covariance matrix is provided as an argument. The method of SS calculations follows Adams and Collyer 2018. Additional output with a "gls." prefix is also available.

Notes for geomorph 3.0.4 and subsequent versions

Compared to previous versions of geomorph, users might notice differences in effect sizes. Previous versions used z-scores calculated with expected values of statistics from null hypotheses (sensu Collyer et al. 2015); however Adams and Collyer (2016) showed that expected values for some statistics can vary with sample size and variable number, and recommended finding the expected value, empirically, as the mean from the set of random outcomes. Geomorph 3.0.4 and subsequent versions now center z-scores on their empirically estimated expected values and where appropriate, log-transform values to assure statistics are normally distributed. This can result in negative effect sizes, when statistics are smaller than expected compared to the average random outcome. For ANOVA-based functions, the option to choose among different statistics to measure effect size is now a function argument.

References

Anderson MJ. 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecology 26: 32-46.

Anderson MJ. and C.J.F. terBraak. 2003. Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computation and Simulation 73: 85-113.

Goodall, C.R. 1991. Procrustes methods in the statistical analysis of shape. Journal of the Royal Statistical Society B 53:285-339.

Collyer, M.L., D.J. Sekora, and D.C. Adams. 2015. A method for analysis of phenotypic change for phenotypes described by high-dimensional data. Heredity. 115:357-365.

Adams, D.C. and M.L. Collyer. 2016. On the comparison of the strength of morphological integration across morphometric datasets. Evolution. 70:2623-2631.

Adams, D.C. and M.L. Collyer. 2018. Multivariate comparative methods: evaluations, comparisons, and recommendations. Systematic Biology. 67:14-31.

Examples

Run this code

# NOT RUN {
### ANOVA example for Goodall's F test (multivariate shape vs. factors)
data(plethodon) 
Y.gpa <- gpagen(plethodon$land)    #GPA-alignment  
gdf <- geomorph.data.frame(Y.gpa, 
site = plethodon$site, 
species = plethodon$species) # geomorph data frame

fit1 <- procD.lm(coords ~ species * site, 
data = gdf, iter = 999, 
RRPP = FALSE, print.progress = FALSE) # randomize raw values
fit2 <- procD.lm(coords ~ species * site, 
data = gdf, iter = 999, 
RRPP = TRUE, print.progress = FALSE) # randomize residuals

summary(fit1)
summary(fit2)

# RRPP example applications

coef(fit2)
coef(fit2, test = TRUE)
anova(fit2) # same as summary
anova(fit2, effect.type = "Rsq")
# if species and site were modeled as random effects ...
anova(fit2, error = c("species:site", "species:site", "Residuals"))  
# not run, because it is a large object to print 
# remove # to run
# predict(fit2) 

# TPS plots for fitted values of some specimens

M <- Y.gpa$consensus
plotRefToTarget(M, fit2$GM$fitted[,,1], mag = 3)
plotRefToTarget(M, fit2$GM$fitted[,,20], mag = 3)

### THE FOLLOWING ILLUSTRATES SIMPLER SOLUTIONS FOR 
### THE NOW DEPRECATED advanced.procD.lm FUNCTION AND
### PERFORM ANALYSES ALSO FOUND VIA THE morphol.disparity FUNCTION
### USING THE pairwise FUNCTION

# Comparison of LS means, with log(Csize) as a covariate

# Model fits
fit.null <- procD.lm(coords ~ log(Csize) + species + site, data = gdf, 
iter = 999, print.progress = FALSE)
fit.full <- procD.lm(coords ~ log(Csize) + species * site, data = gdf, 
iter = 999, print.progress = FALSE)

# ANOVA, using anova.lm.rrpp function from the RRPP package (replaces advanced.procD.lm)
anova(fit.null, fit.full, print.progress = FALSE)

# Pairwise tests, using pairwise function from the RRPP package
gp <-  interaction(gdf$species, gdf$site)

PW <- pairwise(fit.full, groups = gp, covariate = NULL)

# Pairwise distances between means, summarized two ways (replaces advanced.procD.lm):
summary(PW, test.type = "dist", confidence = 0.95, stat.table = TRUE)
summary(PW, test.type = "dist", confidence = 0.95, stat.table = FALSE)

# Pairwise comaprisons of group variances, two ways (same as morphol.disaprity):
summary(PW, test.type = "var", confidence = 0.95, stat.table = TRUE)
summary(PW, test.type = "var", confidence = 0.95, stat.table = FALSE)
morphol.disparity(fit.full, groups = gp, iter=999)

### Regression example
data(ratland)
rat.gpa<-gpagen(ratland)         #GPA-alignment
gdf <- geomorph.data.frame(rat.gpa) # geomorph data frame is easy without additional input

fit <- procD.lm(coords ~ Csize, data = gdf, iter = 999, 
RRPP = TRUE, print.progress = FALSE) 
summary(fit)

### Extracting objects and plotting options
# diagnostic plots
plot(fit, type = "diagnostics") 
# diagnostic plots, including plotOutliers
plot(fit, type = "diagnostics", outliers = TRUE) 

# PC plot rotated to major axis of fitted values
plot(fit, type = "PC", pch = 19, col = "blue") 
# Regression-type plots

# Use fitted values from the model to make prediction lines
plot(fit, type = "regression", 
predictor = gdf$Csize, reg.type = "RegScore", 
pch = 19, col = "green")

# Uses coefficients from the model to find the projected regression scores
rat.plot <- plot(fit, type = "regression", 
predictor = gdf$Csize, reg.type = "RegScore", 
pch = 21, bg = "yellow") 

# TPS grids for min and max scores in previous plot
preds <- shape.predictor(fit$GM$fitted, x = rat.plot$RegScore, 
                        predmin = min(rat.plot$RegScore), 
                        predmax = max(rat.plot$RegScore))
M <- rat.gpa$consensus
plotRefToTarget(M, preds$predmin, mag=2)
plotRefToTarget(M, preds$predmax, mag=2)
                        
attributes(fit)
fit$fitted[1:3, ] # the fitted values (first three specimens)
fit$GM$fitted[,, 1:3] # the fitted values as Procrustes coordinates (same specimens)

### THE FOLLOWING ILLUSTRATES SIMPLER SOLUTIONS FOR 
### THE NOW DEPRECATED nested.update FUNCTION USING
### THE anova GENERIC FUNCTION

data("larvalMorph")
Y.gpa <- gpagen(larvalMorph$tailcoords, curves = larvalMorph$tail.sliders,
ProcD = TRUE, print.progress = FALSE)
gdf <- geomorph.data.frame(Y.gpa, treatment = larvalMorph$treatment, 
family = larvalMorph$family)

fit <- procD.lm(coords ~ treatment/family, data = gdf, 
print.progress = FALSE, iter = 199)
anova(fit) # treatment effect not adjusted
anova(fit, error = c("treatment:family", "Residuals")) # treatment effect updated (adjusted)



# }

Run the code above in your browser using DataLab