This function is meant to expedite the creation of NEXUS files formatted
for performing tip-dating analyses in the popular phylogenetics software MrBayes,
particularly clock-less tip-dating analyses executed with 'empty' morphological matrices
(i.e. where all taxa are coded for a single missing character), although a pre-existing
morphological matrix can also be input by the user (see argument origNexusFile
).
Under some options, this pre-existing matrix may be edited by this function.
The resulting full NEXUS script is output as a set of character strings either
printed to the R console, or output to file which is then overwritten.
createMrBayesTipDatingNexus(
tipTimes,
outgroupTaxa = NULL,
treeConstraints = NULL,
ageCalibrationType,
whichAppearance = "first",
treeAgeOffset,
minTreeAge = NULL,
collapseUniform = TRUE,
anchorTaxon = TRUE,
newFile = NULL,
origNexusFile = NULL,
parseOriginalNexus = TRUE,
createEmptyMorphMat = TRUE,
orderedChars = NULL,
morphModel = "strong",
morphFiltered = "parsInf",
runName = NULL,
ngen = "100000000",
doNotRun = FALSE,
autoCloseMrB = FALSE,
cleanNames = TRUE,
printExecute = TRUE
)
If argument newFile
is NULL
, then the text of the
generated NEXUS script is output to the console as a series of character strings.
This input may be either: (a) a timeList
object,
consisting of a list
of length = 2
, composed of a table of interval
upper and lower time boundaries (i.e., the earlier and latter bounds of the intervals) and
a table of first and last intervals for taxa, or (b) a matrix with row names
corresponding to taxon names, matching those names listed in the MrBayes block,
with either one, two or four columns containing ages (respectively) for point occurrences with
precise dates (for a single column), uncertainty bounds on a point occurrence
(for two columns), or uncertainty bounds on the first and
last occurrence (for four columns). Note that precise first and last occurrence
dates should not be entered as a two column matrix, as this will instead be interpreted
as uncertainty bounds on a single occurrence. Instead, either select which you want to
use for tip-dates and give a one-column matrix, or repeat (and collate) the columns, so that
the first and last appearances has uncertainty bounds of zero.
A vector of type 'character', containing taxon names designating the outgroup.
All taxa not listed in the outgroup will be constrained to be a monophyletic ingroup, for sake of rooting
the resulting dated tree.
Either treeConstraints
or outgroupTaxa
must be defined, but not both.
If the outgroup-ingroup split is not present on the supplied
treeConstraints
, add that split to treeConstraints
manually.
An object of class phylo
,
from which (if treeConstraints
is supplied) the set topological constraints are derived, as
as described for argument tree
for function createMrBayesConstraints
.
Either treeConstraints
or outgroupTaxa
must be defined, but not both.
If the outgroup-ingroup split is not present on the supplied treeConstraints
, add that split to treeConstraints
manually.
This argument decides how age calibrations are defined,
and currently allows for four options: "fixedDateEarlier"
which fixes tip
ages at the earlier (lower) bound for the selected age of appearance (see argument
whichAppearance
for how that selection is made), "fixedDateLatter"
which fixes the date to the latter (upper) bound of the selected age of appearance,
"fixedDateRandom"
which fixes tips to a date that is randomly drawn from a
uniform distribution bounded by the upper and lower bounds on the selected age of
appearance, or (the recommended option) "uniformRange"
which places a uniform
prior on the age of the tip, bounded by the latest and earliest (upper and lower)
bounds on the the selected age.
Which appearance date of the taxa should be used:
their 'first'
or their 'last'
appearance date? The default
option is to use the 'first' appearance date. Note that use of the last
appearance date means that tips will be constrained to occur before their
last occurrence, and thus could occur long after their first occurrence (!).
In addition, createMrBayesTipDatingNexus
allows for two
options for this argument that are in addition to those offered by
createMrBayesTipCalibrations
. Both of these options will duplicate
the taxa in the inputs multiple times, modifying their OTU labels, thus allowing
multiple occurrences of long-lived morphotaxa to be listed as multiple OTUs
arrayed across their stratigraphic duration. If
whichAppearance = "firstLast"
, taxa will be duplicated so each taxon is
listed as occurring twice: once at their first appearance, and a second time at
their last appearance. Note that if a taxon first and last appears in the same interval,
and ageCalibrationType = "uniformRange"
, then
the resulting posterior trees may place the OTU assigned to the last occurrence before the
first occurrence in temporal order (but the assignment, in that case, was entirely
arbitrary). When whichAppearance = "rangeThrough"
, each taxon will be
duplicated into as many OTUs as each
interval that a taxon ranges through (in a timeList
format, see other
paleotree
functions), with the corresponding age uncertainties for those intervals.
If the input tipTimes
is not a list of
length = 2
, however, the function will
return an error under this option.
A parameter given by the user controlling the offset between the minimum and expected tree age prior. mean tree age for the offset exponential prior on tree age will be set to the minimum tree age, plus this offset value. Thus, an offset of 10 million years would equate to a prior assuming that the expected tree age is around 10 million years before the minimum age.
if NULL
(the default), then minTreeAge
will
be set as the oldest date among the tip age used (those used being
determine by user choices (or oldest bound on a tip age). Otherwise,
the user can supply their own minimum tree, which must be greater than
whatever the oldest tip age used is.
MrBayes won't accept uniform age priors where the maximum and
minimum age are identical (i.e. its actually a fixed age). Thus, if this argument
is TRUE
(the default), this function
will treat any taxon ages where the maximum and minimum are identical as a fixed age, and
will override setting ageCalibrationType = "uniformRange"
for those dates.
All taxa with their ages set to fixed by the behavior of anchorTaxon
or collapseUniform
are returned as a list within a commented line of the returned MrBayes block.
This argument may be a logical (default is TRUE
,
or a character string of length = 1.
This argument has no effect if ageCalibrationType
is not set to
"uniformRange"
, but the argument may still be evaluated.
If ageCalibrationType = "uniformRange"
,
MrBayes will do a tip-dating analysis with uniform age uncertainties on
all taxa (if such uncertainties exist; see collapseUniform
).
However, MrBayes does not record how each tree sits on an absolute time-scale,
so if the placement of every tip is uncertain, lining up multiple dated trees
sampled from the posterior (where each tip's true age might
differ) could be a nightmare to back-calculate, if not impossible.
Thus, if ageCalibrationType = "uniformRange"
, and there are no tip taxa given
fixed dates due to collapseUniform
(i.e. all of the tip ages have a range of uncertainty on them),
then a particular taxon will be selected and given a fixed date equal to its
earliest appearance time for its respective whichAppearance
.
This taxon can either be indicated by the user or instead the first taxon listed
in tipTimes
will be arbitrary selected. All taxa with their ages set
to fixed by the behavior of anchorTaxon
or collapseUniform
are returned as a list within a commented line of the returned MrBayes block.
Filename (possibly with path) as a character string
leading to a file which will be overwritten with the output tip age calibrations.
If NULL
, tip calibration commands are output to the console.
Filename (possibly with path) as a character
string leading to a NEXUS text file, presumably containing a matrix
of character date formated for MrBayes. If supplied
(it does not need to be supplied), the listed file is read as a text file, and
concatenated with the MrBayes script produced by this function, so as to
reproduce the original NEXUS matrix for executing in MrBayes.
Note that the taxa in this NEXUS file are NOT checked against the user
input tipTimes
and treeConstraints
, so it is up to the user to
ensure the taxa are the same across the three data sources.
If TRUE
(the default), the original NEXUS file is parsed and
the taxon names listed within in the matrix are compared against the other inputs
for matching (completely, across all inputs that include taxon names).
Thus, it is up to the user to ensure the same
taxa are found in all inputs. However, some NEXUS files may not parse correctly
(particularly if character data for taxa stretches across more than a single line in the matrix).
This may necessitate setting this argument to FALSE
, which will instead do a straight scan
of the NEXUS matrix without parsing it, and without checking the taxon names against other outputs.
Some options for whichAppearance
will not be available, however.
If origNexusFile
is not specified (implying there is no
pre-existing morphological character matrix for this dataset), then an 'empty' NEXUS-formatted matrix will be
appended to the set of MrBayes commands if this command is TRUE
(the default). This
'empty' matrix will have each taxon in tipTimes
coded for a single missing character
(i.e., '?'). This allows tip-dating analyses with hard topological constraints, and ages
determined entirely by the fossilized birth-death prior, with no impact from a
presupposed morphological clock (thus a 'clock-less analysis').
Should be a vector of numbers, indicating which characters should have their
character-type in MrBayes changed to 'ordered'.
If NULL
, the default, then all characters will be treated as essentially unordered.
No character ID should be listed that is higher than the number of characters in the matrix provided in
origNexusFile
. If origNexusFile
is not provided, while orderedChars
is defined, then an error will be returned.
This argument can be used to switch between two end-member models of
morphological evolution in MrBayes, here named 'strong' and 'relaxed', for the 'strong assumptions'
and 'relaxed assumptions' models described by Bapst et al. (2018, Syst. Biol.).
The default is a model which makes very 'strong' assumptions about the process of morphological evolution,
while the 'relaxed' alternative allows for considerably more heterogeneity in the rate
of morphological evolution across characters, and in the forward and reverse transition
rates between states. Also see argument morphFiltered
.
This argument controls what type of filtering the input
morphological data is assumed to have been collected under. The likelihood of
the character data will be modified to take into account the apparent filtering
(Lewis, 2001; Allman et al., 2010). The default value, "parsInf"
, forces
characters to be treated as if they were collected as part of a parsimony-based
study, with constant characters and autapomorphies (characters that only differ
in state in a single taxon unit) ignored or otherwise filtered out, and any such
characters in the presented matrix will be ignored. morphFiltered = "variable"
assumes that while constant characters are still filtered out (e.g. it is
difficult or impossible to count the number of morphological characters that
show no variation across a group), the autapomorphies were intentionally collected
and included in the presented matrix. Thus, constant characters in the included
matrix will be ignored, but autapomorphies will be considered.
The name of the run, used for naming the log files and MCMC output files.
If not set, the name will be taken from the name given for outputting
the NEXUS script (newFile
). If newFile
is not given, and
runName
is not set by the user, the default run name will be "new_run_paleotree".
Number of generations to set the MCMCMC to run for.
Default (ngen = 100000000
) is very high.
If TRUE
, the commands that cause a script to automatically begin running in
MrBayes will be left out. Useful for troubleshooting initial runs of scripts for non-fatal errors and
warnings (such as ignored constraints). Default for this argument is FALSE
.
If TRUE
, the MrBayes script created by this function will
'autoclose', so that when an MCMC run finishes the specified number of generations,
it does not interactively check whether to continue the MCMC. This is often necessary
for batch analyses.
If TRUE
(the default), then special characters
(currently, this only contains the forward-slashes: '/') are removed from
taxon names before construction of the NEXUS file.
If TRUE
(the default) and if output is directed to a newFile
(i.e. a newFile
is specified), a line for pasting into MrBayes for executing the newly created file
will be messaged to the terminal.
David W. Bapst. This code was produced as part of a project funded by National Science Foundation grant EAR-1147537 to S. J. Carlson.
The basic MrBayes commands utilized in the output script are a collection of best practices taken from studying NEXUS files supplied by April Wright, William Gearty, Graham Slater, Davey Wright, and guided by the recommendations of Matzke and Wright, 2016 in Biology Letters.
Users must supply a data set of tip ages (in various formats),
which are used to construct age calibrations commands on the tip taxa
(via paleotree function createMrBayesTipCalibrations
).
The user must also supply some topological constraint:
either a set of taxa designated as the outgroup, which
is then converted into a command constraining
the monophyly on the ingroup taxa, which is presumed to be
all taxa not listed in the outgroup.
Alternatively, a user may supply a tree which is then
converted into a series of hard topological constraints
(via function createMrBayesConstraints
.
Both types of topological constraints cannot be applied.
Many of the options available with createMrBayesTipCalibrations
are available with this function, allowing users to choose between fixed
calibrations or uniform priors that approximate stratigraphic uncertainty.
In addition, the user may also supply a path to a text file
presumed to be a NEXUS file containing character
data formatted for use with MrBayes.
The taxa listed in tipTimes
must match the taxa in
treeConstraints
, if such is supplied. If supplied, the taxa in outgroupTaxa
must be contained within this same set of taxa. These all must have matches
in the set of taxa in origNexusFile
, if provided and
if parseOriginalNexus
is TRUE
.
Note that because the same set of taxa must be contained in all inputs,
relationships are constrained as 'hard' constraints, rather than 'partial' constraints,
which allows some taxa to float across a partially fixed topology.
See the documentation for createMrBayesConstraints
,
for more details.
The basic fundamentals of tip-dating, and tip-dating with the fossilized birth-death model are introduced in these two papers:
Ronquist, F., S. Klopfstein, L. Vilhelmsen, S. Schulmeister, D. L. Murray, and A. P. Rasnitsyn. 2012. A Total-Evidence Approach to Dating with Fossils, Applied to the Early Radiation of the Hymenoptera. Systematic Biology 61(6):973-999.
Zhang, C., T. Stadler, S. Klopfstein, T. A. Heath, and F. Ronquist. 2016. Total-Evidence Dating under the Fossilized Birth-Death Process. Systematic Biology 65(2):228-249.
For recommended best practices in tip-dating analyses, please see:
Matzke, N. J., and A. Wright. 2016. Inferring node dates from tip dates in fossil Canidae: the importance of tree priors. Biology Letters 12(8).
The rationale behind the two alternative morphological models are described in more detail here:
Bapst, D. W., H. A. Schreiber, and S. J. Carlson. 2018. Combined Analysis of Extant Rhynchonellida (Brachiopoda) using Morphological and Molecular Data. Systematic Biology 67(1):32-48.
This function wraps various aspects of the functions createMrBayesConstraints
and the function createMrBayesTipCalibrations
. In many ways, this functionality is a
replacement for the probabilistic dating method represented by the cal3
dating functions.
For putting the posterior estimated trees on an absolute time scale, see
functions obtainDatedPosteriorTreesMrB
. Use the argument getFixedTimes = TRUE
if you used a taxon with a fixed age, and function setRootAges
to set the root age.
# load retiolitid dataset
data(retiolitinae)
# let's try making a NEXUS file!
# Use a uniform prior, with a 10 million year offset for
# the expected tree age from the earliest first appearance
# Also set average tree age to be 10 Ma earlier than first FAD
outgroupRetio <- "Rotaretiolites"
# this taxon will now be sister to all other included taxa
# the following will create a NEXUS file
# with an 'empty' morph matrix
# where the only topological constraint is on ingroup monophyly
# Probably shouldn't do this: leaves too much to the FBD prior
# with doNotRun set to TRUE for troubleshooting
createMrBayesTipDatingNexus(
tipTimes = retioRanges,
outgroupTaxa = outgroupRetio,
treeConstraints = NULL,
ageCalibrationType = "uniformRange",
whichAppearance = "first",
treeAgeOffset = 10,
newFile = NULL,
origNexusFile = NULL,
createEmptyMorphMat = TRUE,
runName = "retio_dating",
doNotRun = TRUE
)
# let's try it with a tree for topological constraints
# this requires setting outgroupTaxa to NULL
# let's also set doNotRun to FALSE
createMrBayesTipDatingNexus(
tipTimes = retioRanges,
outgroupTaxa = NULL,
treeConstraints = retioTree,
ageCalibrationType = "uniformRange",
whichAppearance = "first",
treeAgeOffset = 10,
newFile = NULL,
origNexusFile = NULL,
createEmptyMorphMat = TRUE,
runName = "retio_dating",
doNotRun = FALSE
)
# the above is essentially cal3 with a better algorithm,
# and no need for a priori rate estimates
# just need a tree and age estimates for the tips!
####################################################
# some more variations for testing purposes
# no morph matrix supplied or generated
# you'll need to manually append to an existing NEXUS file
createMrBayesTipDatingNexus(
tipTimes = retioRanges,
outgroupTaxa = NULL,
treeConstraints = retioTree,
ageCalibrationType = "uniformRange",
whichAppearance = "first",
treeAgeOffset = 10,
newFile = NULL,
origNexusFile = NULL,
createEmptyMorphMat = FALSE,
runName = "retio_dating",
doNotRun = TRUE
)
if (FALSE) {
# let's actually try writing an example with topological constraints
# to file and see what happens
# here's my super secret MrBayes directory
file <- "D:\\dave\\workspace\\mrbayes\\exampleRetio.nex"
createMrBayesTipDatingNexus(
tipTimes = retioRanges,
outgroupTaxa = NULL,
treeConstraints = retioTree,
ageCalibrationType = "uniformRange",
whichAppearance = "first",
treeAgeOffset = 10,
newFile = file,
origNexusFile = NULL,
createEmptyMorphMat = TRUE,
runName = "retio_dating",
doNotRun = FALSE
)
}
Run the code above in your browser using DataLab