multi.ace: Ancestral states estimations with multiple trees

Description

Fast ancestral states estimations run on multiple trees using the Mk model from castor::asr_mk_model.

Usage

multi.ace(
  data,
  tree,
  models = "ER",
  threshold = TRUE,
  special.tokens,
  special.behaviours,
  brlen.multiplier,
  verbose = FALSE,
  parallel = FALSE,
  output,
  castor.options,
  estimation.details = NULL
)

Value

Returns a "matrix" or "list" of ancestral states. By default, the function returns the ancestral states in the same format as the input matrix. This can be changed using the option output = "matrix" or "list" to force the class of the output. To output the combined ancestral states and input, you can use "combined" (using the input format) or "combined.matrix" or "combined.list".

Arguments

data: A matrix or list with the characters for each taxa.
tree: A phylo or mutiPhylo object (if the tree argument contains node labels, they will be used to name the output).
models: A vector of models to be passed to asr_mk_model.
threshold: either logical for applying a relative threshold (TRUE - default) or no threshold (FALSE) or a numeric value of the threshold (e.g. 0.95). See details.
special.tokens: optional, a named vector of special tokens to be passed to grep (make sure to protect the character with "\\"). By default special.tokens <- c(missing = "\\?", inapplicable = "\\-", polymorphism = "\\&", uncertainty = "\\/"). Note that NA values are not compared and that the symbol "@" is reserved and cannot be used.
special.behaviours: optional, a list of one or more functions for a special behaviour for special.tokens. See details.
brlen.multiplier: optional, a vector of branch length modifiers (e.g. to convert time branch length in changes branch length) or a list of vectors (the same length as tree).
verbose: logical, whether to be verbose (TRUE) or not (FALSE - default).
parallel: logical, whether to use parallel algorithm (TRUE) or not (FALSE - default).
output: optional, see Return section below.
castor.options: optional, a named list of options to be passed to function called by asr_mk_model.
estimation.details: optional, whether to also return the details for each estimation as returned by asr_mk_model. This argument can be left NULL (default) or be any combination of the elements returned by asr_mk_model (e.g. c("loglikelihood", "transition_matrix")).

Author

Thomas Guillerme

Details

The models argument can be a single or a list of transition matrix, a single or a a vector of built-in model(s) (see below) or a list of both matrices and built-in models: The available built-in models in asr_mk_model are:

"ER" for all equal rates
"SYM" for symmetric rates
"ARD" all rates are different
"SUEDE" equal stepwise transitions (e.g. for meristic/counting characters)
"SRD" different stepwise transitions

See directly asr_mk_model for more models.

The threshold option allows to convert ancestral states likelihoods into discrete states. When threshold = FALSE, the ancestral state estimated is the one with the highest likelihood (or at random if likelihoods are equal). When threshold = TRUE, the ancestral state estimated are all the ones that are have a scaled likelihood greater than the maximum observed scaled likelihood minus the inverse number of possible states (i.e. select_state >= (max(likelihood) - 1/n_states)). This option makes the threshold selection depend on the number of states (i.e. if there are more possible states, a lower scaled likelihood for the best state is expected). Finally using a numerical value for the threshold option (e.g. threshold = 0.95) will simply select only the ancestral states estimates with a scaled likelihood equal or greater than the designated value. This option makes the threshold selection absolute. Regardless, if more than one value is select, the uncertainty token (special.tokens["uncertainty"]) will be used to separate the states. If no value is selected, the uncertainty token will be use between all observed characters (special.tokens["uncertainty"]).

special.behaviours allows to generate a special rule for the special.tokens. The functions should can take the arguments character, all_states with character being the character that contains the special token and all_states for the character (which is automatically detected by the function). By default, missing data returns and inapplicable returns all states, and polymorphisms and uncertainties return all present states.

missing = function(x,y) y
inapplicable = function(x,y) y
polymorphism = function(x,y) strsplit(x, split = "\\&")[[1]]
uncertainty = function(x,y) strsplit(x, split = "\\/")[[1]]

Functions in the list must be named following the special token of concern (e.g. missing), have only x, y as inputs and a single output a single value (that gets coerced to integer automatically). For example, the special behaviour for the special token "?" can be coded as: special.behaviours = list(missing = function(x, y) return(NA) to make ignore the character for taxa containing "?".

When using the parallel option (either through using parallel = TRUE by using the number of available cores minus on or manually setting the number of cores - e.g. parallel = 5), the asr_mk_model function will use the designated number of cores (using the option Nthreads = <requested_number_of_cores>). Additionally, if the input tree is a "multiPhylo" object, the trees will be run in parallel for each number of cores, thus decreasing computation time accordingly (e.g. if 3 cores are requested and tree contains 12 "phylo" objects, 4 different "phylo" objects will be run in parallel on the 3 cores making the calculation around 3 times faster).

Examples

Run this code

set.seed(42)
## A simple example:
## A random tree with 10 tips
tree <- rcoal(10)
## Setting up the parameters
my_rates = c(rgamma, rate = 10, shape = 5)

## A random Mk matrix (10*50)
matrix_simple <- sim.morpho(tree, characters = 50, model = "ER", rates = my_rates,
                            invariant = FALSE)

## Run a basic ancestral states estimations
ancestral_states <- multi.ace(matrix_simple, tree)
ancestral_states[1:5, 1:5]

## A more complex example
## Create a multiple list of 5 trees
multiple_trees <- rmtree(5, 10)

## Modify the matrix to contain missing and special data
matrix_complex <- matrix_simple
matrix_complex[sample(1:length(matrix_complex), 50)] <- "-"
matrix_complex[sample(1:length(matrix_complex), 50)] <- "0%2"
matrix_complex[sample(1:length(matrix_complex), 50)] <- "?"
matrix_complex[1:5,1:5]

## Set a list of extra special tokens
my_spec_tokens <- c("weirdtoken" = "%")

## Set some special behaviours for the "weirdtoken" and for "-" and "?"
my_spec_behaviours <- list()
## Inapplicable tokens "-" are ignored
my_spec_behaviours$inapplicable <- function(x,y) return(NA)
## Missing tokens "?" are considered as all states
my_spec_behaviours$missing      <- function(x,y) return(y)
## Weird tokens are considered as state 0 and 3
my_spec_behaviours$weirdtoken   <- function(x,y) return(c(1,2))

## Create a random branch length modifier to apply to each tree
branch_lengths <- rnorm(18)^2

## Setting a list of model ("ER" for the 25 first characters and then "SYM")
my_models <- c(rep("ER", 25), rep("SYM", 25))

## Run the ancestral states on all the tree with multiple options
ancestral_states <- multi.ace(matrix_complex, multiple_trees,
                              verbose = TRUE,
                              models = my_models,
                              threshold = 0.95,
                              special.tokens = my_spec_tokens,
                              special.behaviours = my_spec_behaviours,
                              brlen.multiplier = branch_lengths,
                              output = "combined.matrix")

## The results for the the two first characters for the first tree
ancestral_states[[1]][, 1:2]

if (FALSE) {
## The same example but running in parallel
ancestral_states <- multi.ace(matrix_complex, multiple_trees,
                              verbose = TRUE,
                              models = my_models,
                              threshold = 0.95,
                              special.tokens = my_spec_tokens,
                              special.behaviours = my_spec_behaviours,
                              brlen.multiplier = branch_lengths,
                              output = "combined.matrix",
                              parallel = TRUE)
}

Run the code above in your browser using DataLab