advanced.procD.lm: Procrustes ANOVA and pairwise tests for shape data, using complex linear models

Description

The function quantifies the relative amount of shape variation explained by a suite of factors and covariates in a "full" model, after accounting for variation in a "reduced" model. Inputs are formulae for full and reduced models (order is not important, but it is better to list the model with the most terms first or use a geomorph data frame), plus indication if means or slopes are to be compared among groups, with appropriate formulae to define how they should be compared.

Usage

advanced.procD.lm(f1, f2, groups = NULL, slope = NULL, angle.type = c("r",
  "deg", "rad"), phy = NULL, Cov = NULL, pc.shape = FALSE,
  effect.type = c("F", "SS", "Rsq"), iter = 999, seed = NULL,
  print.progress = TRUE, data = NULL, ...)

Arguments

A formula for a linear model, containing the response matrix (e.g., y ~ x1 + x2)

A formula for another linear model (e.g., ~ x1 + x2 + x3 + a*b). f1 and f2 should be nested.

groups

A formula for grouping factors (e.g., ~ a, or ~ a*b). This argument should be left NULL unless one wishes to perform pairwise comparisons of different group levels. Note that this argument is used in conjunction with the argument, slope. If slope is NULL, a pairwise comparison test is performed on group least squares (LS) means. If slope is not NULL, this argument will designate the group levels to compare in terms of their slopes.

slope

A formula with one - and only one - covariate (e.g., ~ x3). This argument must be used in conjunction with the groups argument. It will not make sense if the groups argument is left NULL. The groups argument defines the groups; the slope argument defines for which covariate group slopes are compared. Group slopes can differ in their magnitude and direction of shape change.

angle.type

A value specifying whether directional differences between slopes should be represented by vector correlations (r), radians (rad) or degrees (deg).

phy

A phylogenetic tree of class phylo - see read.tree in library ape (optional).

Cov

A covariance matrix to guide GLS computations. If both a phylogenetic tree and covariance matrix are provided, a Brownian motion covariance matrix will not be calculated, based on the phylogeny (it will be ignored). Currently this function cannot handle multiple covariance matrices.

pc.shape

An argument for whether analysis should be performed on the principal component scores of shape. This is a useful option if the data are high-dimensional (many more variables than observations) but will not affect results.

effect.type

An optional argument for which distribution of statistics should be used for calculating effect sizes ( and P-values). The default is "F" for the distribution of random F-statistics, but "SS" and "Rsq" are also possible, for the distributions of random SS between models or R-squared values, respectively. One should not choose "SS" if a PGLS model is considered. P-values should be similar in most cases, regardless of statistic chosen, as the rank correlations between statistics are either perfect (SS and Rsq for OLS) or generally large.

iter

Number of iterations for significance testing

seed

An optional argument for setting the seed for random permutations of the resampling procedure. If left NULL (the default), the exact same P-values will be found for repeated runs of the analysis (with the same number of iterations). If seed = "random", a random seed will be used, and P-values will vary. One can also specify an integer for specific seed values, which might be of interest for advanced users.

print.progress

A logical value to indicate whether a progress bar should be printed to the screen. This is helpful for long-running analyses.

data

A data frame for the function environment; see geomorph.data.frame. If variables are transformed in formulae, they should also be transformed in the geomorph data frame. (See examples.)

...

Arguments passed on to procD.fit (typically associated with the lm function, such as weights or offset).

Value

Function returns an ANOVA table of statistical results for model comparison: error df (for each model), SS, MS, F ratio, Z, and Prand. A list of essentially the same components as procD.lm is also returned, and additionally LS means or slopes, pairwise differences comparisons of these, effect sizes, and P-values may also be returned. If a group formula is provided but slope formula is null, pairwise differences are Procrustes distances between least squares (LS) means for the defined groups. If a slope formula is provided, two sets of pairwise differences, plus effect sizes and P-values, are provided. The first is for differences in slope vector length (magnitude). The length of the slope vector corresponds to the amount of shape change per unit of covariate change. Large differences correspond to differences in the amount of shape change between groups. The second is for slope vector orientation differences. Differences in the direction of shape change (covariance of shape variables) can be summarized as a vector correlation or angle between vectors. See summary.advanced.procD.lm for summary options.

Details

This function calculates residual sum of squares either via ordinary least squares (OLS) estimation or phylogenetic least squares (PGLS) estimation for both full and reduced models. Residuals from the reduced model are used in a randomized residual permutation procedure (RRPP) to find the difference in residual sum of squares (trace of the residual sums of squares and cross-products matrix, SSCP) over many permutations, thus creating a distribution of sum of squares (SS) for the parameters that differ between models (Collyer et al. 2015). The SS can be converted to F-values to generate an empirical F-distribution. A P-value is estimated as the percentile of the observed value in this distribution.

The response matrix 'Y' can be in the form of a two-dimensional data matrix of dimension (n x [p x k]) or a 3D array (p x k x n). It is assumed that the landmarks have previously been aligned using Generalized Procrustes Analysis (GPA) [e.g., with gpagen]. The names specified for the independent (x) variables in the formula represent one or more vectors containing continuous data or factors. It is assumed that the order of the specimens in the shape matrix matches the order of values in the independent variables. Linear model fits (using the lm function) can also be input in place of a formula. Arguments for lm can also be passed on via this function.

The SS calculated is the same as the sum of squared Procrustes distances among specimens, as used as a measure of SS in Procrustes ANOVA (see Goodall 1991). Procrustes ANOVA, often used in morphometrics applications is equivalent to distance-based anova designs (Anderson 2001). Unlike procD.lm, this function is strictly for comparison of two nested models. (Use of procD.lm will be more suitable in most cases.) Effect-sizes (Z-scores) are computed as standard deviates of the statistic chosen for ANOVA (see arguments) or for pairwise statistic sampling distributions generated, which might be more intuitive for P-values than F-values (see Collyer et al. 2015). For ANOVA Z-scores, a log-transformation is performed first, to assure a normally distributed sampling distribution.

Pairwise tests have two flavors: 1) tests for differences in group means (based on vector length between means for pairwise comparisons) and 2) tests for angular differences in slopes between groups. These tests are similar in concept to trajectory analysis (Adams and Collyer 2007; Collyer and Adams 2007; Adams and Collyer 2009; Collyer and Adams 2013), in that pairwise statistics are either vector lengths or angular differences between vectors. These tests are different than trajectory analysis (seetrajectory.analysis), however, because a factorial model is not explicitly needed to contrast vectors between point factor levels nested within group factor levels. For angular differences between factor-covariate slopes, either the angle or the vector correlation can be tested. It should be understood that a vector correlation of 1 (parallel vectors), not 0, is the null hypothesis, meaning slopes are the same.

Pairwise tests are only performed if formulae are provided to compute such results. The generic functions, print, summary, and plot all work with advanced.procD.lm. The generic function, plot, produces diagnostic plots for residuals of the linear fit. Note that there is an argument in print/summary generic functions to print formulas as row names of the ANOVA table. If formulas are long, it is recommended to make this argument, formula = FALSE, in which case "reduced" and "full" models will be acknowledged.

Notes for geomorph 3.0.7 and subsequent versions

The advanced.procD.lm function now defers to the R package, RRPP, specifically the anova.lm.rrpp and pairwise functions. These functions perform all necessary computations needed for advanced.procD.lm, as well as other analyses. Therefore, advanced.procD.lm is now a wrapper for these other functions. The lm.rrpp function can be used for multiple models, if one wishes to work directly in RRPP, prior to using anova.lm.rrpp and pairwise functions. The only difference in results (compared to version 3.0.6 and before) should occur when comparing univariate slopes. Version 3.0.6 and earlier versions appended a vector of 1s to slopes as an ad-hoc strategy to make computations work. This is no longer needed, as the RRPP functions can better handle univariate data.

Notes for geomorph 3.0.6 and subsequent versions

For pairwise tests, previous versions assumed that pairwise comparisons of least-squares means used models with parallel slopes. Under most circumstances, this assumption is safe (and preferred), as the estimation of mean differences otherwise would have to assume something about the mean values of covariates as appropriate locations for estimating means. Version 3.0.6 and subseqent versions find least-squares means that are truer to the model defined. For example, if a user defines a full model with parallel slopes, e.g., shape ~ x + A + B + A:B, where x is a covariate and A and B are factors, results should be no different than before. However, if a user defines a full model which allows unique slopes, e.g., shape ~ x + A + B + x:A + x:B + A:B + x:A:B, least squares means will now be estimated for mean values of x using the coefficients for x:A, x:B, and x:A:B (previous versions did not). This change is to made to be consistent with other least-squares means estimation functions in other packages.

Notes for geomorph 3.0.4 and subsequent versions

Compared to previous versions of geomorph, users might notice differences in effect sizes. Previous versions used z-scores calculated with expected values of statistics from null hypotheses (sensu Collyer et al. 2015); however Adams and Collyer (2016) showed that expected values for some statistics can vary with sample size and variable number, and recommended finding the expected value, empirically, as the mean from the set of random outcomes. Geomorph 3.0.4 and subsequent versions now center z-scores on their empirically estimated expected values and where appropriate, log-transform values to assure statistics are normally distributed. This can result in negative effect sizes, when statistics are smaller than expected compared to the average random outcome. For ANOVA-based functions, the option to choose among different statistics to measure effect size is now a function argument.

An optional argument for including a phylogenetic tree of class phylo is included in this function. ANOVA performed on separate PGLS models is analogous to a likelihood ratio test between models (Adams and Collyer 2018). Pairwise tests can also be performed after PGLS estimation of coefficients but users should be aware that no formal research on the statistical properties (type I error rates and statistical power) of pairwise statistics with PGLS has yet been performed. Using PGLS and analysis of pairwise statistics, therefore, assumes some risk.

References

Adams, D.C., and M.L. Collyer. 2007. The analysis of character divergence along environmental gradients and other covariates. Evolution 61:510-515.

Adams, D.C., and M.L. Collyer. 2009. A general framework for the analysis of phenotypic trajectories in evolutionary studies. Evolution 63:1143-1154.

Adams, D.C. and M.L. Collyer. 2016. On the comparison of the strength of morphological integration across morphometric datasets. Evolution. 70:2623-2631.

Adams, D.C. and M.L. Collyer. 2018. Multivariate phylogenetic comparative methods: evaluations, comparisons, and recommendations. Systematic Biology. 67:14-31.

Collyer, M.L., and D.C. Adams. 2007. Analysis of two-state multivariate phenotypic change in ecological studies. Ecology 88:683-692.

Collyer, M.L., and D.C. Adams. 2013. Phenotypic trajectory analysis: comparison of shape change patterns in evolution and ecology. Hystrix 24: 75-83.

Collyer, M.L., D.J. Sekora, and D.C. Adams. 2015. A method for analysis of phenotypic change for phenotypes described by high-dimensional data. Heredity. 115:357-365.

Examples

Run this code

# NOT RUN {
data(plethodon)
Y.gpa<-gpagen(plethodon$land, print.progress = FALSE)    #GPA-alignment
gdf <- geomorph.data.frame(Y.gpa, species = plethodon$species, 
site = plethodon$site)

# Example of a nested model comparison (as with ANOVA with RRPP)
ANOVA <-  advanced.procD.lm(f1= coords ~ log(Csize) + species,
f2= ~ log(Csize)*species*site, iter=99, data = gdf)
summary(ANOVA, formula = FALSE) # formulas too long to print

# Example of a test of a factor interaction, plus pairwise comparisons
PW.means.test <- advanced.procD.lm(f1= coords ~ site*species, f2= ~ site + species, 
groups = ~site*species, iter=99, data = gdf)
summary(PW.means.test, formula = TRUE)

# Example of a test of a factor interaction, plus pairwise comparisons,
# accounting for a common allometry
PW.ls.means.test <- advanced.procD.lm(f1= coords ~ Csize + site*species,
f2= ~ log(Csize) + site + species,
groups = ~ site*species, iter = 99, data = gdf)
summary(PW.ls.means.test, formula = TRUE)

# Example of a test of homogeneity of slopes, plus pairwise slopes comparisons
gdf$group <- factor(paste(gdf$species, gdf$site, sep="."))
HOS <- advanced.procD.lm(f1= coords ~ log(Csize) + group,
f2= ~ log(Csize) * group, groups = ~ group,
slope = ~ log(Csize), angle.type = "deg", iter = 99, data = gdf)
summary(HOS, formula = FALSE) # formulas too long to print

# Example of partial pairwise comparisons, given greater model complexity.
# Plus, working with class advanced.procD.lm objects.
aov.pleth <- advanced.procD.lm(f1= coords ~ log(Csize)*site*species,
f2= ~ log(Csize) + site*species, groups = ~ species, 
slope = ~ log(Csize), angle.type = "deg", iter = 99, data = gdf)

summary(aov.pleth, formula = FALSE)  # formulas too long to print

# Diagnostic plots
plot(aov.pleth) 

# Extracting objects from results
aov.pleth$slopes # extract the slope vectors

# GLS Examples (same as procD.gpls example)
data(plethspecies)
Y.gpa<-gpagen(plethspecies$land)    
gdf <- geomorph.data.frame(Y.gpa, tree = plethspecies$phy)
procD.pgls(coords ~ Csize, phy = tree, data = gdf, iter = 999)

advanced.procD.lm(coords ~ Csize, ~1, phy = gdf$tree, data = gdf, iter = 999)

# Could also do this with ape function
# phyCov <- vcv.phylo(plethspecies$phy)
# advanced.procD.lm(coords ~ Csize, ~1, Cov = phyCov, data = gdf, iter = 999)

# }

Run the code above in your browser using DataLab