The function quantifies the relative amount of shape variation attributable to one or more factors in a
linear model and estimates the probability of this variation ("significance") for a null model, via distributions generated
from resampling permutations. Data input is specified by a formula (e.g.,
y~X), where 'y' specifies the response variables (shape data), and 'X' contains one or more independent
variables (discrete or continuous). The response matrix 'y' can be either in the form of a two-dimensional data
matrix of dimension (n x [p x k]), or a 3D array (p x n x k). It is assumed that -if the data based
on landmark coordinates - the landmarks have previously been aligned using Generalized Procrustes Analysis (GPA)
[e.g., with gpagen
].
The names specified for the independent (x) variables in the formula represent one or more
vectors containing continuous data or factors. It is assumed that the order of the specimens in the
shape matrix matches the order of values in the independent variables. Linear model fits (using the lm
function)
can also be input in place of a formula. Arguments for lm
can also be passed on via this function.
The function two.d.array
can be used to obtain a two-dimensional data matrix from a 3D array of landmark
coordinates; however this step is no longer necessary, as procD.lm can receive 3D arrays as dependent variables. It is also
recommended that geomorph.data.frame
is used to create and input a data frame. This will reduce problems caused
by conflicts between the global and function environments. In the absence of a specified data frame, procD.lm will attempt to
coerce input data into a data frame, but success is not guaranteed.
The function performs statistical assessment of the terms in the model using Procrustes distances among
specimens, rather than explained covariance matrices among variables. With this approach, the sum-of-squared
Procrustes distances are used as a measure of SS (see Goodall 1991). The observed SS are evaluated through
permutation. In morphometrics this approach is known as a Procrustes ANOVA (Goodall 1991), which is equivalent
to distance-based anova designs (Anderson 2001). Two possible resampling procedures are provided. First, if RRPP=FALSE,
the rows of the matrix of shape variables are randomized relative to the design matrix.
This is analogous to a 'full' randomization. Second, if RRPP=TRUE, a residual randomization permutation procedure is utilized
(Collyer et al. 2015). Here, residual shape values from a reduced model are
obtained, and are randomized with respect to the linear model under consideration. These are then added to
predicted values from the remaining effects to obtain pseudo-values from which SS are calculated. NOTE: for
single-factor designs, the two approaches are identical. However, when evaluating factorial models it has been
shown that RRPP attains higher statistical power and thus has greater ability to identify patterns in data should
they be present (see Anderson and terBraak 2003).
Effect-sizes (Z scores) are computed as standard deviates of either the SS,
F, or Cohen's f-squared sampling distributions generated, which might be more intuitive for P-values than F-values
(see Collyer et al. 2015). Values from these distributions are log-transformed prior to effect size estimation,
to assure normally distributed data. The SS type will influence how Cohen's f-squared values are calculated.
Cohen's f-squared values are based on partial eta-squared values that can be calculated sequentially or marginally, as with SS.
In the case that multiple factor or factor-covariate interactions are used in the model
formula, one can specify whether all main effects should be added to the
model first, or interactions should precede subsequent main effects
(i.e., Y ~ a + b + c + a:b + ..., or Y ~ a + b + a:b + c + ..., respectively.)
The generic functions, print
, summary
, and plot
all work with procD.lm
.
The generic function, plot
has several options for plotting, using plot.procD.lm
. Diagnostics plots,
principal component plots (rotated to first PC of covariance matrix of fitted values), and regression plots can be performed. The
latter is fundamentally similar to the plotting options for procD.allometry
. One must provide a linear predictor, and
can choose among common regression component (CRC), predicted values (PredLine), or regression scores (RegScore). See procD.allometry
for details. In these plotting optons, the predictor does not need to be size, and fitted values and residuals from the procD.lm fit are used rather
than mean-centered values.
Notes for geomorph 3.0.4 and subsequent versions
Compared to previous versions of geomorph, users might notice differences in effect sizes. Previous versions used z-scores calculated with
expected values of statistics from null hypotheses (sensu Collyer et al. 2015); however Adams and Collyer (2016) showed that expected values
for some statistics can vary with sample size and variable number, and recommended finding the expected value, empirically, as the mean from the set
of random outcomes. Geomorph 3.0.4 and subsequent versions now center z-scores on their empirically estimated expected values and where appropriate,
log-transform values to assure statistics are normally distributed. This can result in negative effect sizes, when statistics are smaller than
expected compared to the avergae random outcome. For ANOVA-based functions, the option to choose among different statistics to measure effect size
is now a function argument.