phylo4d-methods: Combine a phylogenetic tree with data

Description

phylo4d is a generic constructor which merges a phylogenetic tree with data frames to create a combined object of class phylo4d

Usage

phylo4d(x, ...)
# S4 method for phylo4
phylo4d(
  x,
  tip.data = NULL,
  node.data = NULL,
  all.data = NULL,
  merge.data = TRUE,
  metadata = list(),
  ...
)
# S4 method for matrix
phylo4d(
  x,
  tip.data = NULL,
  node.data = NULL,
  all.data = NULL,
  merge.data = TRUE,
  metadata = list(),
  edge.length = NULL,
  tip.label = NULL,
  node.label = NULL,
  edge.label = NULL,
  order = "unknown",
  annote = list(),
  ...
)
# S4 method for phylo
phylo4d(
  x,
  tip.data = NULL,
  node.data = NULL,
  all.data = NULL,
  check.node.labels = c("keep", "drop", "asdata"),
  annote = list(),
  metadata = list(),
  ...
)
# S4 method for phylo4d
phylo4d(x, ...)
# S4 method for nexml
phylo4d(x)

Value

An object of class phylo4d.

Arguments

x: an object of class phylo4, phylo, nexml or a matrix of edges (see above)
...: further arguments to control the behavior of the constructor in the case of missing/extra data and where to look for labels in the case of non-unique labels that cannot be stored as row names in a data frame (see Details).
tip.data: a data frame (or object to be coerced to one) containing only tip data (Optional)
node.data: a data frame (or object to be coerced to one) containing only node data (Optional)
all.data: a data frame (or object to be coerced to one) containing both tip and node data (Optional)
merge.data: if both tip.data and node.data are provided, should columns with common names will be merged together (default TRUE) or not (FALSE)? See details.
metadata: any additional metadata to be passed to the new object
edge.length: Edge (branch) length. (Optional)
tip.label: A character vector of species names (names of "tip" nodes). (Optional)
node.label: A character vector of internal node names. (Optional)
edge.label: A character vector of edge (branch) names. (Optional)
order: character: tree ordering (allowable values are listed in phylo4_orderings, currently "unknown", "preorder" (="cladewise" in ape), and "postorder", with "cladewise" and "pruningwise" also allowed for compatibility with ape)
annote: any additional annotation data to be passed to the new object
check.node.labels: if x is of class phylo, use either “keep” (the default) to retain internal node labels, “drop” to drop them, or “asdata” to convert them to numeric tree data. This argument is useful if the phylo object has non-unique node labels or node labels with informative data (e.g., posterior probabilities).

Methods

x = "phylo4": merges a tree of class phylo4 with a data.frame into a phylo4d object
x = "matrix": merges a matrix of tree edges similar to the edge slot of a phylo4 object (or to $edge of a phylo object) with a data.frame into a phylo4d object
x = "phylo": merges a tree of class phylo with a data.frame into a phylo4d object

Author

Ben Bolker, Thibaut Jombart, Steve Kembel, Francois Michonneau, Jim Regetz

Details

You can provide several data frames to define traits associated with tip and/or internal nodes. By default, data row names are used to link data to nodes in the tree, with any number-like names (e.g., “10”) matched against node ID numbers, and any non-number-like names (e.g., “n10”) matched against node labels. Alternative matching rules can be specified by passing additional arguments (listed in the Details section); these include positional matching, matching exclusively on node labels, and matching based on a column of data rather than on row names.

Matching rules will apply the same way to all supplied data frames. This means that you need to be consistent with the row names of your data frames. It is good practice to use tip and node labels (or node numbers if you use duplicated labels) when you combine data with a tree.

If you provide both tip.data and node.data, the treatment of columns with common names will depend on the merge.data argument. If TRUE, columns with the same name in both data frames will be merged; when merging columns of different data types, coercion to a common type will follow standard R rules. If merge.data is FALSE, columns with common names will be preserved independently, with “.tip” and “.node” appended to the names. This argument has no effect if tip.data and node.data have no column names in common.

If you provide all.data along with either of tip.data and node.data, it must have distinct column names, otherwise an error will result. Additionally, although supplying columns with the same names within data frames is not illegal, automatic renaming for uniqeness may lead to surprising results, so this practice should be avoided.

This is the list of additional arguments that can be used to control matching between the tree and the data:

match.data: (logical) should the rownames of the data frame be used to be matched against tip and internal node identifiers?
rownamesAsLabels: (logical), should the row names of the data provided be matched only to labels (TRUE), or should any number-like row names be matched to node numbers (FALSE and default)
label.type: character, rownames or column: should the labels be taken from the row names of dt or from the label.column column of dt?
label.column: iff label.type=="column", column specifier (number or name) of the column containing tip labels
missing.data: action to take if there are missing data or if there are data labels that don't match
extra.data: action to take if there are extra data or if there are labels that don't match
keep.all: (logical), should the returned data have rows for all nodes (with NA values for internal rows when type='tip', and vice versa) (TRUE and default) or only rows corresponding to the type argument

Rules for matching rows of data to tree nodes are determined jointly by the match.data and rownamesAsLabels arguments. If match.data is TRUE, data frame rows will be matched exclusively against tip and node labels if rownamesAsLabels is also TRUE, whereas any all-digit row names will be matched against tip and node numbers if rownamesAsLabels is FALSE (the default). If match.data is FALSE, rownamesAsLabels has no effect, and row matching is purely positional with respect to the order returned by nodeId(phy, type).

Examples

Run this code


treeOwls <- "((Strix_aluco:4.2,Asio_otus:4.2):3.1,Athene_noctua:7.3);"
tree.owls.bis <- ape::read.tree(text=treeOwls)
try(phylo4d(as(tree.owls.bis,"phylo4"),data.frame(wing=1:3)), silent=TRUE)
obj <- phylo4d(as(tree.owls.bis,"phylo4"),data.frame(wing=1:3), match.data=FALSE)
obj
print(obj)

####

data(geospiza_raw)
geoTree <- geospiza_raw$tree
geoData <- geospiza_raw$data

## fix differences in tip names between the tree and the data
geoData <- rbind(geoData, array(, dim = c(1,ncol(geoData)),
                  dimnames = list("olivacea", colnames(geoData))))

### Example using a tree of class 'phylo'
exGeo1 <- phylo4d(geoTree, tip.data = geoData)

### Example using a tree of class 'phylo4'
geoTree <- as(geoTree, "phylo4")

## some random node data
rNodeData <- data.frame(randomTrait = rnorm(nNodes(geoTree)),
                        row.names = nodeId(geoTree, "internal"))

exGeo2 <- phylo4d(geoTree, tip.data = geoData, node.data = rNodeData)

### Example using 'merge.data'
data(geospiza)
trGeo <- extractTree(geospiza)
tDt <- data.frame(a=rnorm(nTips(trGeo)), row.names=nodeId(trGeo, "tip"))
nDt <- data.frame(a=rnorm(nNodes(trGeo)), row.names=nodeId(trGeo, "internal"))

(matchData1 <- phylo4d(trGeo, tip.data=tDt, node.data=nDt, merge.data=FALSE))
(matchData2 <- phylo4d(trGeo, tip.data=tDt, node.data=nDt, merge.data=TRUE))

## Example with 'all.data'
nodeLabels(geoTree) <- as.character(nodeId(geoTree, "internal"))
rAllData <- data.frame(randomTrait = rnorm(nTips(geoTree) + nNodes(geoTree)),
row.names = labels(geoTree, 'all'))

exGeo5 <- phylo4d(geoTree, all.data = rAllData)

## Examples using 'rownamesAsLabels' and comparing with match.data=FALSE
tDt <- data.frame(x=letters[1:nTips(trGeo)],
                  row.names=sample(nodeId(trGeo, "tip")))
tipLabels(trGeo) <- as.character(sample(1:nTips(trGeo)))
(exGeo6 <- phylo4d(trGeo, tip.data=tDt, rownamesAsLabels=TRUE))
(exGeo7 <- phylo4d(trGeo, tip.data=tDt, rownamesAsLabels=FALSE))
(exGeo8 <- phylo4d(trGeo, tip.data=tDt, match.data=FALSE))

## generate a tree and some data
set.seed(1)
p3 <- ape::rcoal(5)
dat <- data.frame(a = rnorm(5), b = rnorm(5), row.names = p3$tip.label)
dat.defaultnames <- dat
row.names(dat.defaultnames) <- NULL
dat.superset <- rbind(dat, rnorm(2))
dat.subset <- dat[-1, ]

## create a phylo4 object from a phylo object
p4 <- as(p3, "phylo4")

## create phylo4d objects with tip data
p4d <- phylo4d(p4, dat)
###checkData(p4d)
p4d.sorted <- phylo4d(p4, dat[5:1, ])
try(p4d.nonames <- phylo4d(p4, dat.defaultnames))
p4d.nonames <- phylo4d(p4, dat.defaultnames, match.data=FALSE)

if (FALSE) {
p4d.subset <- phylo4d(p4, dat.subset)
p4d.subset <- phylo4d(p4, dat.subset)
try(p4d.superset <- phylo4d(p4, dat.superset))
p4d.superset <- phylo4d(p4, dat.superset)
}

## create phylo4d objects with node data
nod.dat <- data.frame(a = rnorm(4), b = rnorm(4))
p4d.nod <- phylo4d(p4, node.data = nod.dat, match.data=FALSE)


## create phylo4 objects with node and tip data
p4d.all1 <- phylo4d(p4, node.data = nod.dat, tip.data = dat, match.data=FALSE)
nodeLabels(p4) <- as.character(nodeId(p4, "internal"))
p4d.all2 <- phylo4d(p4, all.data = rbind(dat, nod.dat), match.data=FALSE)

Run the code above in your browser using DataLab