extended.pgls: Extended Phylogenetic ANOVA/regression for Procrustes shape variables

Description

Function performs extended PGLS that enables more than one individual per species to be included in a phylogenetic least squares analysis.

Usage

extended.pgls(
  f1,
  phy = NULL,
  Cov = NULL,
  species = NULL,
  delta = 0.001,
  gamma = c("sample", "equal"),
  iter = 999,
  seed = NULL,
  int.first = FALSE,
  turbo = FALSE,
  Parallel = FALSE,
  verbose = FALSE,
  data = NULL,
  print.progress = TRUE,
  ...
)

Value

An object of class lm.rrpp.ws is a list containing the following

call: The matched call.
LM: Linear Model objects, including data (Y), coefficients, design matrix (X), sample size (n), number of dependent variables (p), dimension of data space (p.prime), QR decomposition of the design matrix, fitted values, residuals, weights, offset, model terms, data (model) frame, random coefficients (through permutations), random vector distances for coefficients (through permutations), whether OLS or GLS was performed, and the mean for OLS and/or GLS methods. Note that the data returned resemble a model frame rather than a data frame; i.e., it contains the values used in analysis, which might have been transformed according to the formula. The response variables are always labeled Y.1, Y.2, ..., in this frame.
ANOVA: Analysis of variance objects, including the SS type, random SS outcomes, random MS outcomes, random R-squared outcomes, random F outcomes, random Cohen's f-squared outcomes, P-values based on random F outcomes, effect sizes for random outcomes, sample size (n), number of variables (p), and degrees of freedom for model terms (df). These objects are used to construct ANOVA tables.
PermInfo: Permutation procedure information, including the number of permutations (perms), The method of residual randomization (perm.method), and each permutation's sampling frame (perm.schedule), which is a list of reordered sequences of 1:n, for how residuals were randomized.

Arguments

f1: A formula for the linear model (e.g., y~x1+x2).
phy: A phylogenetic tree of class = "phylo" - see read.tree in library ape
Cov: An argument for including a phylogenetic covariance matrix if a phylogeny is not provided. This may be obtained under any evolutionary model, and if included, any weights are ignored. This matrix must match in dimensions the number of species.
species: A variable that can be found in the data frame indicating the species to which each individual belongs. This variable must be in the data frame. It is imperative that these names match the species names in the phylogeny or phylogenetic covariance matrix. The data do not need to have row names but the species variable has to be provided.
delta: A within-species scaling parameter for covariances, ranging from 0 to 1. If delta = 0, a sight value (0.001) is added to assure variances of the covariance matrix are 0.1 percent larger than covariances.
gamma: A sample-size scaling parameter that is adjusted to be 1 ("equal") scaling or the square-root of the sample size for species observations ("sample").
iter: Number of iterations for significance testing
seed: An optional argument for setting the seed for random permutations of the resampling procedure. If left NULL (the default), the exact same P-values will be found for repeated runs of the analysis (with the same number of iterations). If seed = "random", a random seed will be used, and P-values will vary. One can also specify an integer for specific seed values, which might be of interest for advanced users.
int.first: A logical value to indicate if interactions of first main effects should precede subsequent main effects
turbo: A logical value that if TRUE, suppresses coefficient estimation in every random permutation. This will affect subsequent analyses that require random coefficients (see coef.lm.rrpp) but might be useful for large data sets for which only ANOVA is needed.
Parallel: Either a logical value to indicate whether parallel processing should be used, a numeric value to indicate the number of cores to use, or a predefined socket cluster.
verbose: A logical value to indicate if all possible output from an analysis should be retained. Generally this should be FALSE, unless one wishes to extract, e.g., all possible terms, model matrices, QR decomposition, or random permutation schemes.
data: A data frame for the function environment, see rrpp.data.frame. A data frame is required for this analysis.
print.progress: A logical value to indicate whether a progress bar should be printed to the screen. This is helpful for long-running analyses.
...: Arguments typically used in lm, such as weights or offset, passed on to lm.rrpp for estimation of coefficients. If both weights and a covariance matrix are included, weights are ignored (since inverses of weights are the diagonal elements of weight matrix, used in lieu of a covariance matrix.)

Author

Dean Adams

Details

The function performs linear models in a phylogenetic context in a manner that can accommodate multiple individuals per species and can accommodate high-dimensional datasets. The approach utilizes a hierarchical linear model, an expanded phylogenetic covariance matrix, and permutation procedures to obtain empirical sampling distributions and effect sizes for model effects that can evaluate differences in intraspecific trends across species for both univariate and multivariate data, while conditioning them on the phylogeny (Adams and Collyer 2024).

Data input is specified by several components, including minimally: 1) a formula (e.g., y ~ X), where 'y' specifies the response variables (shape data), and 'X' contains one or more independent variables (discrete or continuous). The response matrix 'Y' can be either in the form of a two-dimensional data matrix of dimension (n x [p x k]), or a 3D array (p x n x k). It is assumed that the landmarks have previously been aligned using Generalized Procrustes Analysis (GPA) [e.g., with gpagen].

2) An argument for species is also required, which specifies the species to which each observation belongs.

3) Either phy or COV must be included, which specifies the expected phylogenetic non-independence among species. Here, 'phy' represents a phylogenetic tree of class 'phylo'. In this case, the expected covariances among species will be estimated under a Brownian motion model of evolution. Alternatively, 'COV' is a phylogenetic covariance matrix obtained previously by the user, and may represent the expected covariances among species under any model of evolutionary change (BM, OU, etc.).

From the phylogenenetic covariance matrix, a phylogenetic transformation matrix is obtained and used to transform the Y variables and the model design matrices constructed from X, and parameter estimates are obtained. Next, variance components for all model terms are calculated, using a combination of type II and type III sums of squares, so that within-species effects may be isolated from interspecies effects (see Adams and Collyer 2024). Likewise, residuals from the model are permuted in such a manner as to isolate intra- and inter-species effects, to account for fact that multiple observations are included for each species. Additional statistical and philosophical details are found in Adams and Collyer 2024.

References

Adams, D.C and M.L Collyer. 2024. Extending phylogenetic regression models for comparing within-species patterns across the tree of life. Methods in Ecology and Evolution. 15:2234-2246.

Examples

Run this code

if (FALSE) {
data(pupfish.ws)

# With phylogeny
fit <- extended.pgls(f1 = coords~Species * Sex + Population, 
  data = pupfish.ws, species = "Species",
  phy = pupfish.ws$phy) 
  
anova(fit) 
 
# multivariate stats 
fit.mult <- manova.update(fit, PC.no = 40)
summary(fit.mult, test = "Wilks") 

# With phylogenetic covariance matrix

fit2 <- extended.pgls(f1 = coords ~ Species * Sex + Population, 
  data = pupfish.ws, species = "Species",
  Cov = pupfish.ws$Cov)
  
anova(fit2)

#sultivariate stats
fit2.mult <- manova.update(fit2, PC.no = 40)
summary(fit2.mult, test = "Wilks") 
}

Run the code above in your browser using DataLab