Function calculates the degree of phylogenetic signal from Procrustes shape variables, using effect size estimated from likelihood.
physignal.z(
A,
phy,
lambda = c("burn", "mean", "front", "all"),
iter = 999,
seed = NULL,
tol = 1e-04,
PAC.no = NULL,
print.progress = FALSE,
verbose = FALSE
)
Function returns a list with the following components:
Phylogenetic signal effect size.
The significance level of the observed signal.
The determinants of the rates matrix for all random permutations.
The log-likelihoods for all random permutations.
The optimized lambda value.
The multivariate K value.
A phylogenetically aligned component analysis, based on OLS residuals.
The multivariate K value in 1, 1:2, 1:3, ..., 1:p dimensions, for the p components from PACA.
The optimized lambda in 1, 1:2, 1:3, ..., 1:p dimensions, for the p components from PACA.
The log-likelihood in 1, 1:2, 1:3, ..., 1:p dimensions, for the p components from PACA.
The number of random permutations used in the resampling procedure.
Method of optimization used for lambda.
Number of data dimensions for which optimization was performed.
The matched call
A matrix (n x [p x k]) or 3D array (p x k x n) containing Procrustes shape variables for a set of specimens
A phylogenetic tree of class = "phylo" - see read.tree
in library ape
An indication for how lambda should be optimized. This can be a numeric value between 0 and 1 to override optimization. Alternatively, it can be one of four methods: burn-in ("burn"); mean lambda across all data dimensions ("mean"); use a subset of all data dimensions where phylogenetic signal is possible, based on a front-loading of phylogenetic signal in the lowest phylogenetically aligned components ("front"); or use of all data dimensions to calculate the log-likelihood in the optimization process ("all"). See Details for more explanation.
Number of iterations for significance testing
An optional argument for setting the seed for random permutations of the resampling procedure. If left NULL (the default), the exact same P-values will be found for repeated runs of the analysis (with the same number of iterations). If seed = "random", a random seed will be used, and P-values will vary. One can also specify an integer for specific seed values, which might be of interest for advanced users.
A value indicating the magnitude below which phylogenetically aligned components should
be omitted. (Components are omitted if their standard deviations are less than or equal to tol times
the standard deviation of the first component.) See ordinate
for more details.
The number of phylogenetically aligned components (PAC) on which to project the data. Users can choose fewer PACs than would be found, naturally.
A logical value to indicate whether a progress bar should be printed to the screen. This is helpful for long-running analyses.
Whether to include verbose output, including phylogenetically-aligned component analysis (PACA), plus K, lambda, and the log-likelihood profiles across PAC dimensions.
Michael Collyer
The function estimates the degree of phylogenetic signal present in Procrustes shape variables for a given
phylogeny, based on the standardized location of the log-likelihood in a distribution generated from
randomization of residuals in a permutation procedure (RRPP). Unlike the analysis performed in
physignal
, this function finds the optimal branch-length transformation, lambda, which maximizes
likelihood. This step results in better statistical properties (Collyer et al. 2022), and the effect
size (Z) -- the standardized location of the observed log-likelihood in its sampling distribution -- has a linear
relationship with lambda. Although, Pagel's lambda (Pagel et al. 1999) and Blomberg's K (Blomberg et al. 2003;
Adams 2014) are frequently used as measures of the amount of phylogenetic signal (and both are also reported),
the Z-score is an effect size that can be compared among different traits
(see compare.physignal.z
), and it more precisely describes the fit of data to a tree
(Collyer et al. 2022).
It is assumed that the landmarks have previously been aligned
using Generalized Procrustes Analysis (GPA) [e.g., with gpagen
]. Multivariate data are subjected
to a phylogenetically aligned component analysis to assure that data dimensions do not exceed the number of
observations, making estimation of multivariate log-likelihoods possible. The tolerance argument can be used to assure
an appropriate number of data dimensions are used. The number of phylogenetically aligned components (PAC.no)
can also be used to estimate log-likelihood in fewer dimensions. This might be necessary for some data sets,
as residual covariance matrices can be ill-conditioned, especially if the number of variables exceeds the number
of observations.
It is assumed that lambda is constant for all shape variables. It would be possible to consider, for example,
whether phylogenetic signal is consistent among different modules via separate analyses, followed by
analysis with compare.physignal.z
. This could be facilitated by using
coords.subset
to partition shape data.
This function can also be used with univariate data (i.e. centroid size) if imported as matrix with rownames
giving the taxa names.
Optimization of the tree branch scaling parameter, lambda, is not a process with a single, resolved solution. For univariate data, optimization is fairly simple: lambda is optimized at the value that yields maximum likelihood. For multivariate data, generalization is not straightforward. Although this function currently assumes all variables in multivariate data have the same lambda, maximizing likelihood over all variables can ignore strong phylogenetic signal restricted to just some of the variables. Therefore, optimization can can consider the latent strength of phylogenetic signal in different ways. Below is a summary of optimization methods, including advantages or disadvantages that might be incurred. Future versions of this function might update optimization methods, as more research affirms better methods.
A burn-in optimization samples a spectrum of lambda from 0 to 1, finding the effect size, Z, from several distributions of log-likelihoods, in order to find the lambda value that maximizes the effect size rather than the log-likelihood. Once this value of lambda is obtained, the requested number of permutations are run and the final effect size is estimated from the distribution of random log-likelihoods generated. This is perhaps the best optimization method to assure that the result effect size is maximized, but it requires considerably more time than the other methods.
Lambda is optimized for every variable and the mean of the lambdas is used as the optimized lambda for estimating multivariate log-likelihoods. This method will guard against a tendency for multivariate optimization to be biased toward 0 or 1, but will also tend to find intermediate values of lambda, even if there is strong phylogenetic signal for some variables.
This method performs phylogenetically aligned component analysis (PACA) and
determines the relevant number of components for which covariation between data and phylogeny is
possible (as many or fewer than the number of variables). PACA front-loads phylogenetic signal in first
components. Thus, lambda is optimized with multivariate likelihood
but only for a portion of the full data space where phylogenetic signal might be found. This method
will tend to bias lambda toward 1, making analysis more similar to using multivariate K, as in
physignal
.
This method simply optimizes lambda using a multivariate likelihood for optimization. This is the simplest generalization of univariate lambda optimization, but it will tend to optimize lambda at 0 or 1.
Users can override optimization, by choosing a fixed value. Choosing
lambda = 1 will be fundamentally the same as performing the analysis on multivariate K, as in
physignal
.
The generic functions, print
, summary
, and plot
all work with physignal
.
The generic function, plot
, produces a histogram of random log-likelihoods,
associated with the resampling procedure.
Collyer, M.L., E.K. Baken, & D.C. Adams. 2022. A standardized effect size for evaluating and comparing the strength of phylogenetic signal. Methods in Ecology and Evolution. 13:367-382.
Blomberg SP, Garland T, Ives AR. 2003. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution, 57:717-745.
Adams, D.C. 2014. A generalized K statistic for estimating phylogenetic signal from shape and other high-dimensional multivariate data. Systematic Biology. 63:685-697.
Adams, D.C. and M.L. Collyer. 2019. Comparing the strength of modular signal, and evaluating alternative modular hypotheses, using covariance ratio effect sizes with morphometric data. Evolution. 73:2352-2367.
Pagel, M. D. (1999). Inferring the historical patterns of biological evolution. Nature. 401:877-884.
gm.prcomp
, physignal
if (FALSE) {
data(plethspecies)
Y.gpa <- gpagen(plethspecies$land) #GPA-alignment
# Test for phylogenetic signal in shape
PS.shape <- physignal.z(A = Y.gpa$coords, phy = plethspecies$phy,
lambda = "front")
summary(PS.shape)
# Problem with ill-conditioned residual covariance matrix; try shaving one dimension
PS.shape <- physignal.z(A = Y.gpa$coords, phy = plethspecies$phy,
lambda = "front", PAC.no = 7)
summary(PS.shape)
plot(PS.shape)
plot(PS.shape$PACA, phylo = TRUE)
# Test for phylogenetic signal in size
PS.size <- physignal.z(A = Y.gpa$Csize, phy = plethspecies$phy,
lambda = "front")
summary(PS.size)
plot(PS.size)
}
Run the code above in your browser using DataLab