Learn R Programming

Claddis (version 0.7.0)

safe_taxonomic_reinsertion: Reinsert Safely Removed Taxa Into A Tree

Description

Safely reinsert taxa in a tree after they were removed from a matrix by Safe Taxonomic Reduction.

Usage

safe_taxonomic_reinsertion(
  input_filename,
  output_filename,
  str_taxa,
  multiple_placement_option = "exclude"
)

Value

A vector of taxa which were not reinserted is returned (will be empty if all taxa have been reinserted) and a file is written to (output_filename).

Arguments

input_filename

A Newick-formatted tree file containing tree(s) without safely removed taxa.

output_filename

A file name where the newly generated trees will be written out to (required).

str_taxa

The safe taxonomic reduction table as generated by safe_taxonomic_reduction.

multiple_placement_option

What to do with taxa that have more than one possible reinsertion position. Options are "exclude" (does not reinsert them; the default) or "random" (picks one of the possible positions and uses that - will vary stochastically if multiple trees exist).

Author

Graeme T. Lloyd graemetlloyd@gmail.com

Details

The problem with Safe Taxonomic Reduction (safe_taxonomic_reduction) is that it generates trees without the safely removed taxa, but typically the user will ultimately want to include these taxa and thus there is also a need to perform "Safe Taxonomic Reinsertion".

This function performs that task, given a Newick-formatted tree file and a list of the taxa that were safely removed and the senior taxon and rule used to do so (i.e., the $str_taxa part of the output from safe_taxonomic_reduction).

Note that this function operates on tree files rather than reading the trees directly into R (e.g., with ape's read.tree or read.nexus functions) as in practice this turned out to be impractically slow for the types of data sets this function is intended for (supertrees or metatrees). Importantly this means the function operates on raw Newick text strings and hence will only work on data where there is no extraneous information encoded in the Newick string, such as node labels or branch lengths.

Furthermore, in some cases safely removed taxa will have multiple taxa with which they can be safely placed. These come in two forms. Firstly, the multiple taxa can already form a clade, in which case the safely removed taxon will be reinserted in a polytomy with these taxa. In other words, the user should be aware that the function can result in non-bifurcating trees even if the input trees are all fully bifurcating. Secondly, the safely removed taxon can have multiple positions on the tree where it can be safely reinserted. As this generates ambiguity, by default (multiple_placement_option = "exclude") these taxa will simply not be reinserted. However, the user may wish to still incorporate these taxa and so an additional option (multiple_placement_option = "random") allows these taxa to be inserted at any of its' possible positions, chosen at random for each input topology (to give a realistic sense of phylognetic uncertainty. (Note that an exhaustive list of all possible combinations of positions is not implemented as, again, in practice this turned out to generate unfeasibly large numbers of topologies for the types of applications this function is intended for.)

See Also

safe_taxonomic_reduction

Examples

Run this code

# Generate dummy four taxon trees (where taxa B, D and F were
# previously safely excluded):
trees <- ape::read.tree(text = c("(A,(C,(E,G)));", "(A,(E,(C,G)));"))

# Write trees to file:
ape::write.tree(phy = trees, file = "test_in.tre")

# Make dummy safe taxonomic reduction taxon list:
str_taxa <- matrix(data = c("B", "A", "rule_2b", "D", "C", "rule_2b",
  "F", "A", "rule_2b", "F", "C", "rule_2b"), byrow = TRUE, ncol = 3,
  dimnames = list(c(), c("junior", "senior", "rule")))

# Show that taxa B and D have a single possible resinsertion position,
# but that taxon F has two possible positions (with A or with C):
str_taxa

# Resinsert taxa safely (F will be excluded due to the ambiguity of
# its' position - multiple_placement_option = "exclude"):
safe_taxonomic_reinsertion(input_filename = "test_in.tre",
  output_filename = "test_out.tre", str_taxa = str_taxa,
  multiple_placement_option = "exclude")

# Read in trees with F excluded:
exclude_str_trees <- ape::read.tree(file = "test_out.tre")

# Show first tree with B and D reinserted:
ape::plot.phylo(x = exclude_str_trees[[1]])

# Repeat, but now with F also reinserted with its' position (with
# A or with C) chosen at random:
safe_taxonomic_reinsertion(input_filename = "test_in.tre",
  output_filename = "test_out.tre", str_taxa = str_taxa,
  multiple_placement_option = "random")

# Read in trees with F included:
random_str_trees <- ape::read.tree(file = "test_out.tre")

# Confirm F has now also been reinserted:
ape::plot.phylo(x = random_str_trees[[1]])

# Clean up example files:
file.remove(file1 = "test_in.tre", file2 = "test_out.tre")

Run the code above in your browser using DataLab