Function to fit Ornstein-Uhlenbeck models of trait evolution
slouch.fit(
phy,
species = NULL,
hl_values = NULL,
a_values = NULL,
vy_values = NULL,
sigma2_y_values = NULL,
response,
mv.response = NULL,
fixed.fact = NULL,
direct.cov = NULL,
mv.direct.cov = NULL,
mcov.direct.cov = NULL,
random.cov = NULL,
mv.random.cov = NULL,
mcov.random.cov = NULL,
ace = NULL,
anc_maps = "regimes",
estimate.Ya = FALSE,
estimate.bXa = FALSE,
interactions = FALSE,
hessian = FALSE,
support = 2,
convergence = 1e-06,
nCores = 1,
hillclimb = TRUE,
lower = c(1e-08, 1e-08),
upper = Inf,
verbose = FALSE
)
An object of class 'slouch', essentially a list with the following fields:
a list of the entire parameter space traversed by the grid search and the hillclimber as applicable.
a list of parameters concerning the tree:
phy - an object of class 'phy'
T.term - a numeric vector including the time from the root of the tree to the tip, for all taxa 1,2,3... n.
ta - for all pairs of species, the time from their most recent common ancestor (mrca) to the root of the tree.
tia - for all pairs of species, the time from their mrca to the tip of species i.
tja - the transpose of tia.
tij - for all pairs of species, the time from species i to their mrca, plus the time from their mrca to species j. In other words, tia + transpose(tia).
times - for all nodes (1,2,3... n, root, root+1, ...) in the tree, the time from the root to said node.
lineages - for all species (1,2,3... n), a list of their branch times and regimes as painted on the tree.
regimes - for all nodes (1,2,3... n, root, root+1, ...) in the tree, the respective regime as specified by "phy$node.label
" and "fixed.fact
".
a list of statistics to characterize model fit
a list or matrix used to plot the grid search
a matrix indicating the interval of grid search that is within the support region. If the grid search values are carefully selected, this may be used to estimate the true support region.
the residual variance-covariance matrix for the maximum likelihood model as found by parameter search.
maximum likelihood estimates of parameters under the chosen model.
regression coefficients and associated objects. Whether the regression coefficients are to be interpreted as optima or not depend on the type of model and model estimates.
under a random effect model, "beta_evolutionary" is the evolutionary regression coefficients and associated objects.
number of free parameters with which the likelihood criteria are penalized.
under a random effect model, a matrix of means and standard errors for the independent Brownian motion variable(s). Not to be confused with the regression coefficients when the residuals are under a "bm" model.
a matrix of the path trajectory of the hillclimber routine.
the respective regimes for all species (1,2,3... n).
internal parameters for control flow.
an object of class 'phylo', must be rooted.
a character vector of species tip labels, typically the "species" column in a data frame. This column needs to be an exact match and same order as phy$tip.label
a vector of candidate phylogenetic half-life values to be evaluated in grid search. Optional.
a vector of candidate rate of adaptation values to be evaluated in grid search. Optional.
a vector of candidate stationary variances for the response trait, to be evaluated in grid search. Optional.
alternative to vy_values, if the stationary variance is reparameterized as the variance parameter for the Brownian motion.
a numeric vector of a trait to be treated as response variable
numeric vector of the observational variances of each response trait. E.g if response is a mean trait value, mv.response is the within-species squared standard error of the mean.
factor of regimes on the terminal edges of the tree, in same order as species. If this is used, phy$node.label needs to be filled with the corresponding internal node regimes, in the order of node indices (root: n+1),(n+2),(n+3), ...
Direct effect independent variables
Estimation variances for direct effect independent variables. Must be the same shape as direct.cov
Estimation covariances between the response variable and direct effect independent variables. Most be the same shape as direct.cov
Independent variables each modeled as a brownian motion
Estimation variances for the brownian covariates. Must be the same shape as random.cov
Estimation covariances between the response variable and random effect independent variables. Most be the same shape as random.cov
An ape::ace object, with estimated ancestral character states. Optional
One of "regimes", "ace" or "simmap". "regimes" tells slouch to use `phy$node.label` to assign internal regimes. "ace" tells slouch to use ancestral posterior probabilities for ancestral regimes. "simmap" tells slouch to use the simmap mappings associated with `phy`
a logical value indicathing whether "Ya" should be estimated. If true, the intercept K = 1 is expanded to Ya = exp(-a*t) and b0 = 1-exp(-a*t). If models with categorical covariates are used, this will instead estimate a separate primary optimum for the root niche, "Ya". This only makes sense for non-ultrametric trees. If the tree is ultrametric, the model matrix becomes singular.
a logical value indicathing whether "bXa" should be estimated. If true, bXa = 1-exp(-a*t) - (1-(1-exp(-a*t))/(a*t)) is added to the model matrix, estimating b*Xa. Same requirements as for estimating Ya.
a logical value. Whether to model interactions between (all) direct-effect continuous covariates and categorical regimes (experimental). Defaults to FALSE
use the approximate hessian matrix at the likelihood peak as found by the hillclimber, to compute standard errors for the parameters that enter in parameter search.
a scalar indicating the size of the support set, defaults to 2 units of log-likelihood.
threshold of iterative GLS estimation for when beta is considered to be converged.
number of CPU cores used in grid-search. If 2 or more cores are used, all print statements are silenced during grid search. If performance is critical it is recommended to compile and link R to a multithreaded BLAS, since most of the heavy computations are common matrix operations. Even if a singlethreaded BLAS is used, this may or may not improve performance, and performance may vary with OS.
logical, whether to use hillclimb parameter estimation routine or not. This routine (L-BFGS-B from optim()) may be combined with the grid-search, in which case it will on default start on the sigma and halflife for the local ML found by the grid-search.
lower bounds for the optimization routine, defaults to c(0,0). First entry in vector is half-life, second is stationary variance. When running direct effect models without observational error, it may be useful to specify a positive lower bounds for the stationary variance, e.g c(0, 0.001), since the residual variance-covariance matrix is degenerate when sigma = 0.
upper bounds for the optimization routine, defaults to c(Inf, Inf).
a logical value indicating whether to print a summary in each iteration of parameter search. May be useful when diagnosing unexpected behaviour or crashes.