brown.fit: Function to fit Brownian-motion models of trait evolution

Description

Function to fit Brownian-motion models of trait evolution

Usage

brown.fit(
  phy,
  species = NULL,
  sigma2_y_values = NULL,
  response,
  mv.response = NULL,
  fixed.fact = NULL,
  direct.cov = NULL,
  mv.direct.cov = NULL,
  mcov.direct.cov = NULL,
  random.cov = NULL,
  mv.random.cov = NULL,
  mcov.random.cov = NULL,
  ace = NULL,
  anc_maps = "regimes",
  estimate.Ya = FALSE,
  interactions = FALSE,
  hessian = FALSE,
  support = 2,
  convergence = 1e-06,
  nCores = 1,
  hillclimb = TRUE,
  lower = 1e-08,
  upper = NULL,
  verbose = FALSE
)

Value

An object of class 'slouch', essentially a list with the following fields:

parameter_space

a list of the entire parameter space traversed by the grid search and the hillclimber as applicable.

tree

a list of parameters concerning the tree:

phy - an object of class 'phy'
T.term - a numeric vector including the time from the root of the tree to the tip, for all taxa 1,2,3... n.
ta - for all pairs of species, the time from their most recent common ancestor (mrca) to the root of the tree.
tia - for all pairs of species, the time from their mrca to the tip of species i.
tja - the transpose of tia.
tij - for all pairs of species, the time from species i to their mrca, plus the time from their mrca to species j. In other words, tia + transpose(tia).
times - for all nodes (1,2,3... n, root, root+1, ...) in the tree, the time from the root to said node.
lineages - for all species (1,2,3... n), a list of their branch times and regimes as painted on the tree.
regimes - for all nodes (1,2,3... n, root, root+1, ...) in the tree, the respective regime as specified by "phy$node.label" and "fixed.fact".

modfit

a list of statistics to characterize model fit

supportplot

a list or matrix used to plot the grid search

supported_range

a matrix indicating the interval of grid search that is within the support region. If the grid search values are carefully selected, this may be used to estimate the true support region.

V

the residual variance-covariance matrix for the maximum likelihood model as found by parameter search.

evolpar

maximum likelihood estimates of parameters under the chosen model.

beta_primary

regression coefficients and associated objects. Whether the regression coefficients are to be interpreted as optima or not depend on the type of model and model estimates.

beta_evolutionary

under a random effect model, "beta_evolutionary" is the evolutionary regression coefficients and associated objects.

n.par

number of free parameters with which the likelihood criteria are penalized.

brownian_predictors

under a random effect model, a matrix of means and standard errors for the independent Brownian motion variable(s). Not to be confused with the regression coefficients when the residuals are under a "bm" model.

climblog_df

a matrix of the path trajectory of the hillclimber routine.

fixed.fact

the respective regimes for all species (1,2,3... n).

control

internal parameters for control flow.

Arguments

phy: an object of class 'phylo', must be rooted.
species: a character vector of species tip labels, typically the "species" column in a data frame. This column needs to be an exact match and same order as phy$tip.label
sigma2_y_values: a vector of one or more candidates for sigma squared (y) to be evaluated in grid search.
response: a numeric vector of a trait to be treated as response variable
mv.response: numeric vector of the observational variances of each response trait. E.g if response is a mean trait value, mv.response is the within-species squared standard error of the mean.
fixed.fact: factor of regimes on the terminal edges of the tree, in same order as species. If this is used, phy$node.label needs to be filled with the corresponding internal node regimes, in the order of node indices (root: n+1),(n+2),(n+3), ...
direct.cov: Direct effect independent variables
mv.direct.cov: Estimation variances for direct effect independent variables. Must be the same shape as direct.cov
mcov.direct.cov: Estimation covariances between the response variable and direct effect independent variables. Most be the same shape as direct.cov
random.cov: Independent variables each modeled as a brownian motion
mv.random.cov: Estimation variances for the brownian covariates. Must be the same shape as random.cov
mcov.random.cov: Estimation covariances between the response variable and random effect independent variables. Most be the same shape as random.cov
ace: An ape::ace object, with estimated ancestral character states. Optional
anc_maps: One of "regimes", "ace" or "simmap". "regimes" tells slouch to use `phy$node.label` to assign internal regimes. "ace" tells slouch to use ancestral posterior probabilities for ancestral regimes. "simmap" tells slouch to use the simmap mappings associated with `phy`
estimate.Ya: independently estimates the ancestral state under Brownian motion. Note that, for an intercept model, the intercept IS the ancestral state estimate (since there are no directional or stabilizing trends in a standard Brownian motion).
interactions: a logical value. Whether to model interactions between (all) direct-effect continuous covariates and categorical regimes (experimental). Defaults to FALSE
hessian: use the approximate hessian matrix at the likelihood peak as found by the hillclimber, to compute standard errors for the parameters that enter in parameter search.
support: a scalar indicating the size of the support set, defaults to 2 units of log-likelihood.
convergence: threshold of iterative GLS estimation for when beta is considered to be converged.
nCores: number of CPU cores used in grid-search. If 2 or more cores are used, all print statements are silenced during grid search. If performance is critical it is recommended to compile and link R to a multithreaded BLAS, since most of the heavy computations are common matrix operations. Even if a singlethreaded BLAS is used, this may or may not improve performance, and performance may vary with OS.
hillclimb: logical, whether to use hillclimb parameter estimation routine or not. This routine (L-BFGS-B from optim()) may be combined with the grid-search, in which case it will on default start on the sigma and halflife for the local ML found by the grid-search.
lower: lower bounds for the optimization routine, defaults to 1e-8. When running direct effect models without observational error, it may be useful to specify a positive lower bounds for the sigma squared, since the residual variance-covariance matrix is degenerate when sigma = 0.
upper: upper bounds for the optimization routine, defaults to 10 * var(response) * max(treeheight).
verbose: a logical value indicating whether to print a summary in each iteration of parameter search. May be useful when diagnosing unexpected behaviour or crashes.