Learn R Programming

caper (version 1.0.3)

comparative.data: Comparative dataset creation

Description

A simple tool to combine phylogenies with datasets and ensure consistent structure and ordering for use in functions.

Usage

comparative.data(phy, data, names.col, vcv=FALSE, vcv.dim=2, na.omit=TRUE, 
	             force.root=FALSE, warn.dropped=FALSE, scope=NULL)
# S3 method for comparative.data
print(x, ...)
# S3 method for comparative.data
na.omit(object, scope=NULL, ...)
# S3 method for comparative.data
subset(x, subset, select, ...)
# S3 method for comparative.data
reorder(x, order, ...)
# S3 method for comparative.data
[(x, i, j)
as.comparative.data(x, ...)
caicStyleArgs(phy, data, names.col, warn.dropped=FALSE)

Value

A list of class 'comparative.data':

phy

An object of class 'phylo'

data

A data frame of matched data

data.name

The original object name of the data

phy.name

The original object name of the phylogeny

dropped

A list of taxon names dropped from the dataset:

unmatched.rows

Data rows that do not match to tips

tips

Tips that do not match to data rows

And optionally:

vcv

A variance covariance matrix of the phylogeny

vcv.dim

The dimension of the VCV - 2 for a standard VCV matrix and 3 for an expanded array retaining individual branch lengths

Arguments

data

A data frame containing variables that can be attributed to the taxa at the tips of a phylogeny.

phy

A phylogeny (class 'phylo') to be matched to the data above.

names.col

The name of a column in the provided data frame that will be used to match data rows to phylogeny tips.

vcv

A logical value indicating whether to include a variance covariance array representing the phylogeny within the comparative dataset.

vcv.dim

Either 2 (a standard VCV matrix) or 3 (an array retaining the individual branches contributing to the standard values). The array form is of use for optimising some branch length transformations.

na.omit

A logical value indicating whether to reduce the comparative dataset to those tips for which all selected variables are complete. Note that some functions cannot handle missing data and will return an error.

force.root

Many functions consider a basal polytomy to indicate an unrooted tree. Using force.root=TRUE will set an arbitrary root edge below this polytomy.

warn.dropped

A logical value indicating whether to warn the user when data or tips are dropped in creating the comparative data object.

scope

A model formula, used to indicate which variables to consider when omitting row containing NA values.

x

An object of class 'comparative.data'.

object

An object of class 'comparative.data'.

subset

A logical expression indicating rows of data to keep: missing values are taken as false.

select

An expression, indicating columns to select from the data frame.

order

One of 'cladewise' or 'pruningwise'. See reorder.phylo.

i,j

Indices specifying tips or data columns to extract. See details.

...

Further arguments to functions.

Author

David Orme

Details

The function matches rows in a data frame to tips on a phylogeny and ensures correct ordering of the data with respect to the tips. It also can add a variance covariance representation of the phylogeny. Mismatched rows and tips are removed and the taxon labels of these are stored in the 'dropped' slot of the 'comparative.data' object. The 'print' method displays a brief summary of the dataset contents and the names of the original 'phylo' and 'data.frame' objects. If any rows or tips were dropped, 'print' will also show a venn diagram of the data shared and dropped from each source. Node labels are preserved but must be unique - unlabelled nodes will be assigned numeric codes.

The 'na.omit' and 'subset' methods provide simple ways to clean up and extract parts of the comparative dataset. In particular, 'subset' acts exclusively with the data component of the object and, like subset on a data frame, expects the subset argument to produce a logical vector of data rows to include. The 'reorder' method is use to restructure all the components with the 'comparative.data' object into either pruningwise or cladewise order. This uses code from the 'ape' library: see reorder.phylo.

The '[' method allows subsets to be taken of the data. There are no replace methods ('[<-'). If only one index is specified (e.g. x[2]), then this is interpreted as extracting data columns from the object. Otherwise (e.g. x[2,], x[1,1]), the first index will specify tips to extract and the second index will specify columns. Indices for tips are permitted to be numeric, logical or character vectors or empty (missing) or NULL. Numeric values are coerced to integer as by as.integer (and hence truncated towards zero). Character vectors will be matched to the names of the object (or for matrices/arrays, the dimnames): see 'Character indices' below for further details.

The function 'caicStyleArgs' handles turning 'phy', 'data' and 'names.col' arguments into a 'comparative.data' object when they are provided separately to a function. This argument structure was used in older versions of many functions.

All of these functions are in part a substitute for the considerably more sophisticated handling of such data in the package 'phylobase', which will be integrated into later releases.

See Also

crunch,pgls

Examples

Run this code
data(shorebird)
shorebird <- comparative.data(shorebird.tree, shorebird.data, 'Species')
print(shorebird)

subset(shorebird, subset=Mat.syst == 'MO')

sandpipers <- grep('Calidris', shorebird$phy$tip.label)
shorebird[-sandpipers, ]

sandpipers <- grep('Calidris', shorebird$phy$tip.label, value=TRUE)
shorebird[sandpipers, ]

shorebird[]
shorebird[,]
shorebird[2:3]
shorebird[, 2:3]
shorebird[1:15, ]
shorebird[1:15, 2:3]

Run the code above in your browser using DataLab