Learn R Programming

geomorph (version 4.0.8)

physignal.z: Assessing phylogenetic signal effect size in Procrustes shape variables

Description

Function calculates the degree of phylogenetic signal from Procrustes shape variables, using effect size estimated from likelihood.

Usage

physignal.z(
  A,
  phy,
  lambda = c("burn", "mean", "front", "all"),
  iter = 999,
  seed = NULL,
  tol = 1e-04,
  PAC.no = NULL,
  print.progress = FALSE,
  verbose = FALSE
)

Value

Function returns a list with the following components:

Z

Phylogenetic signal effect size.

pvalue

The significance level of the observed signal.

random.detR

The determinants of the rates matrix for all random permutations.

random.logL

The log-likelihoods for all random permutations.

lambda

The optimized lambda value.

K

The multivariate K value.

PACA

A phylogenetically aligned component analysis, based on OLS residuals.

K.by.p

The multivariate K value in 1, 1:2, 1:3, ..., 1:p dimensions, for the p components from PACA.

lambda.by.p

The optimized lambda in 1, 1:2, 1:3, ..., 1:p dimensions, for the p components from PACA.

logL.by.p

The log-likelihood in 1, 1:2, 1:3, ..., 1:p dimensions, for the p components from PACA.

permutations

The number of random permutations used in the resampling procedure.

opt.method

Method of optimization used for lambda.

opt.dim

Number of data dimensions for which optimization was performed.

call

The matched call

Arguments

A

A matrix (n x [p x k]) or 3D array (p x k x n) containing Procrustes shape variables for a set of specimens

phy

A phylogenetic tree of class = "phylo" - see read.tree in library ape

lambda

An indication for how lambda should be optimized. This can be a numeric value between 0 and 1 to override optimization. Alternatively, it can be one of four methods: burn-in ("burn"); mean lambda across all data dimensions ("mean"); use a subset of all data dimensions where phylogenetic signal is possible, based on a front-loading of phylogenetic signal in the lowest phylogenetically aligned components ("front"); or use of all data dimensions to calculate the log-likelihood in the optimization process ("all"). See Details for more explanation.

iter

Number of iterations for significance testing

seed

An optional argument for setting the seed for random permutations of the resampling procedure. If left NULL (the default), the exact same P-values will be found for repeated runs of the analysis (with the same number of iterations). If seed = "random", a random seed will be used, and P-values will vary. One can also specify an integer for specific seed values, which might be of interest for advanced users.

tol

A value indicating the magnitude below which phylogenetically aligned components should be omitted. (Components are omitted if their standard deviations are less than or equal to tol times the standard deviation of the first component.) See ordinate for more details.

PAC.no

The number of phylogenetically aligned components (PAC) on which to project the data. Users can choose fewer PACs than would be found, naturally.

print.progress

A logical value to indicate whether a progress bar should be printed to the screen. This is helpful for long-running analyses.

verbose

Whether to include verbose output, including phylogenetically-aligned component analysis (PACA), plus K, lambda, and the log-likelihood profiles across PAC dimensions.

Author

Michael Collyer

Details

The function estimates the degree of phylogenetic signal present in Procrustes shape variables for a given phylogeny, based on the standardized location of the log-likelihood in a distribution generated from randomization of residuals in a permutation procedure (RRPP). Unlike the analysis performed in physignal, this function finds the optimal branch-length transformation, lambda, which maximizes likelihood. This step results in better statistical properties (Collyer et al. 2022), and the effect size (Z) -- the standardized location of the observed log-likelihood in its sampling distribution -- has a linear relationship with lambda. Although, Pagel's lambda (Pagel et al. 1999) and Blomberg's K (Blomberg et al. 2003; Adams 2014) are frequently used as measures of the amount of phylogenetic signal (and both are also reported), the Z-score is an effect size that can be compared among different traits (see compare.physignal.z), and it more precisely describes the fit of data to a tree (Collyer et al. 2022).

It is assumed that the landmarks have previously been aligned using Generalized Procrustes Analysis (GPA) [e.g., with gpagen]. Multivariate data are subjected to a phylogenetically aligned component analysis to assure that data dimensions do not exceed the number of observations, making estimation of multivariate log-likelihoods possible. The tolerance argument can be used to assure an appropriate number of data dimensions are used. The number of phylogenetically aligned components (PAC.no) can also be used to estimate log-likelihood in fewer dimensions. This might be necessary for some data sets, as residual covariance matrices can be ill-conditioned, especially if the number of variables exceeds the number of observations. It is assumed that lambda is constant for all shape variables. It would be possible to consider, for example, whether phylogenetic signal is consistent among different modules via separate analyses, followed by analysis with compare.physignal.z. This could be facilitated by using coords.subset to partition shape data. This function can also be used with univariate data (i.e. centroid size) if imported as matrix with rownames giving the taxa names.

Optimization of the tree branch scaling parameter, lambda, is not a process with a single, resolved solution. For univariate data, optimization is fairly simple: lambda is optimized at the value that yields maximum likelihood. For multivariate data, generalization is not straightforward. Although this function currently assumes all variables in multivariate data have the same lambda, maximizing likelihood over all variables can ignore strong phylogenetic signal restricted to just some of the variables. Therefore, optimization can can consider the latent strength of phylogenetic signal in different ways. Below is a summary of optimization methods, including advantages or disadvantages that might be incurred. Future versions of this function might update optimization methods, as more research affirms better methods.

burn

A burn-in optimization samples a spectrum of lambda from 0 to 1, finding the effect size, Z, from several distributions of log-likelihoods, in order to find the lambda value that maximizes the effect size rather than the log-likelihood. Once this value of lambda is obtained, the requested number of permutations are run and the final effect size is estimated from the distribution of random log-likelihoods generated. This is perhaps the best optimization method to assure that the result effect size is maximized, but it requires considerably more time than the other methods.

mean

Lambda is optimized for every variable and the mean of the lambdas is used as the optimized lambda for estimating multivariate log-likelihoods. This method will guard against a tendency for multivariate optimization to be biased toward 0 or 1, but will also tend to find intermediate values of lambda, even if there is strong phylogenetic signal for some variables.

front

This method performs phylogenetically aligned component analysis (PACA) and determines the relevant number of components for which covariation between data and phylogeny is possible (as many or fewer than the number of variables). PACA front-loads phylogenetic signal in first components. Thus, lambda is optimized with multivariate likelihood but only for a portion of the full data space where phylogenetic signal might be found. This method will tend to bias lambda toward 1, making analysis more similar to using multivariate K, as in physignal.

all

This method simply optimizes lambda using a multivariate likelihood for optimization. This is the simplest generalization of univariate lambda optimization, but it will tend to optimize lambda at 0 or 1.

numeric override

Users can override optimization, by choosing a fixed value. Choosing lambda = 1 will be fundamentally the same as performing the analysis on multivariate K, as in physignal.

The generic functions, print, summary, and plot all work with physignal. The generic function, plot, produces a histogram of random log-likelihoods, associated with the resampling procedure.

References

Collyer, M.L., E.K. Baken, & D.C. Adams. 2022. A standardized effect size for evaluating and comparing the strength of phylogenetic signal. Methods in Ecology and Evolution. 13:367-382.

Blomberg SP, Garland T, Ives AR. 2003. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution, 57:717-745.

Adams, D.C. 2014. A generalized K statistic for estimating phylogenetic signal from shape and other high-dimensional multivariate data. Systematic Biology. 63:685-697.

Adams, D.C. and M.L. Collyer. 2019. Comparing the strength of modular signal, and evaluating alternative modular hypotheses, using covariance ratio effect sizes with morphometric data. Evolution. 73:2352-2367.

Pagel, M. D. (1999). Inferring the historical patterns of biological evolution. Nature. 401:877-884.

See Also

gm.prcomp, physignal

Examples

Run this code
if (FALSE) {
data(plethspecies) 
Y.gpa <- gpagen(plethspecies$land)    #GPA-alignment    

# Test for phylogenetic signal in shape
PS.shape <- physignal.z(A = Y.gpa$coords, phy = plethspecies$phy, 
lambda = "front")
summary(PS.shape)

# Problem with ill-conditioned residual covariance matrix; try shaving one dimension

PS.shape <- physignal.z(A = Y.gpa$coords, phy = plethspecies$phy, 
lambda = "front", PAC.no = 7)
summary(PS.shape)
plot(PS.shape)
plot(PS.shape$PACA, phylo = TRUE)

# Test for phylogenetic signal in size
PS.size <- physignal.z(A = Y.gpa$Csize, phy = plethspecies$phy, 
lambda = "front")
summary(PS.size)
plot(PS.size)
}

Run the code above in your browser using DataLab