MrBayes is not great for getting samples of dated posterior phylogenies, or for obtaining certain summary trees from the posterior (specifically the MCCT and MAP, which are specific trees in the posterior). This is because the tree samples as returned are scaled relative to rate parameters in a separate file. This function attempts to automate the handling of multiple files (both .t tree files and .p parameter files), as well as multiple files associated with separate runs, to obtain samples of posterior trees, or summary trees such as the MCCT or MAP. These resulting trees are now scaled to units of time, but not be placed correctly on an absolute time-scale if all tips are extinct. See details of output below.
obtainDatedPosteriorTreesMrB(
runFile,
nRuns = 2,
burnin = 0.5,
outputTrees,
labelPostProb = FALSE,
getFixedTimes = FALSE,
getRootAges = FALSE,
originalNexusFile = NULL,
file = NULL
)
Depending on argument file
, the output tree or trees is either
returned directly, or instead written out in NEXUS format via
ape's write.NEXUS
function to an external file. The output
will consist either of multiple trees sampled from the post-burn-in posterior,
or will consist of a single phylogeny (a summary tree, either
the MCCT or the MAP - see the details for the argument outputTrees
).
If the argument setRootAges = TRUE
is not used,
users are warned that the resulting dated trees will
not have $root.time
elements necessary
for comparison against an absolute time-scale. Wile the
trees may be scaled to units of absolute
time now, rather than with branch lengths expressed in
the rate of character change, the dates
estimated by some phylogenetics functions in
R may give inaccurate estimates of when events
occur on the absolute time-scale if all tips are extinct.
This is because most functions for phylogenetics in R (and
elsewhere) will instead presume that the latest tip
will be at time 0 (the modern), which
may be wrong if you are using paleotree
for
analyzing paleontological datasets
consisting of entirely extinct taxa. This can be
solved by using argument getFixedTimes = TRUE
to obtain fixed tip ages, and then scaling the resulting output to absolute time using
the argument setRootAges = TRUE
, which obtains
a $root.time
element for each tree
using the functions setRootAge
and
setRootAges
(for single and multiple phylogenies).
A filename in the current directory,
or a path to a file that is either a .p
or .t file from a MrBayes analysis. This filename
and path will be used for finding additional
.t and .p files, via the nRuns
settings and assuming that files are in the
same directory and these files are named under
typical MrBayes file naming conventions. (In other words,
if you have renamed your .p or .t files,
this function probably won't be able to find them.)
The number of runs in your analysis. This variable is used for figuring out what filenames will be searched for: if you specify that you have less runs than you actually ran in reality, then some runs won't be examined in this function. Conversely, specify too many, and this function will throw an error when it cannot find files it expects but do not exist. The default for this argument (two runs) is based on the default number of runs in MrBayes.
The fraction of trees sampled in the posterior discarded and not returned by this function directly, nor included in calculation of summary trees. Must be a numeric value greater than 0 and less than 1.
Determines the output trees produced; for format of output, see section
on returned Value below. Must be of length one, and either "all"
,
which means all trees from the post-burn-in posterior will
returned, a number greater than zero, which will be the number of trees
randomly sampled from across the post-burn-in posterior and returned,
or a label for a type of summary tree selected from the posterior based on various
properties. The two most commonly seen such point-estimate-summaries are
the MCCT tree, which stands for the 'maximum clade compatibility tree',
and the MAP tree, which stands for the 'maximum a posteriori tree'.
The MCCT is the single tree from the post-burn-in posterior which
contains the set of bifurcations (clades) with the highest product of posterior
probabilities (i.e. are found on the most trees in the post-burn-in posterior).
The MCCT tree is returned if the argument outputTrees = "MCCT"
is used.
The MAP is the single tree from the post-burn-in posterior with the highest
posterior probabilty associated with it. Unfortunately, versions of
paleotree
prior to version 3.2.1 did not use the posterior probability
to select the supposed 'MAP' tree. MrBayes provides two values
for each sampled tree and corresponding parameters:
LnPr
, the log prior probability of the current parameter
proposals under the specified prior distributions,
and LnL
, the log likelihood of the cold chain, i.e. the log-likelihood
of the sampled parameter values, given the observed data and specified models.
Neither of these are the posterior probability.
The true posterior probability (as given by Bayes Theorem) is
the product of the likelihood and the prior probability, divided by
the likelihood of the model, the latter of which is very rarely known.
More commonly, the calculable portion of the posterior probability is
the product of the likelihood and the prior probability; or, here, easily
calculated as the log posterior probability, as the sum of the
log likelihood and log prior probability. Given confusion over application of 'MAP'
trees in previous version of paleotree, three options are available:
"MAPosteriori"
for the Maximum A Posteriori tree (the MAP tree,
or the single tree in the posterior with the highest
posterior probability, as given by LnPr + LnL
),
"MAPriori"
for the Maximum A Priori tree (the tree in the
posterior sample with the highest prior probability, indep of
the data), and "MaxLikelihood"
, the
tree with the highest model likelihood given the data, ignoring the prior probability.
The former option of outputTrees = "MAP"
is deprecated,
as its previous implementation only examine LnPr
and thus
returned the tree now referred to here as the "MAPriori"
tree.
Interestingly, this bug had no effect when tip-dating methods is applied to
datasets with no character matrix is provided (an empty matrix of '?' missing values is used)
in order to find dated phylogenies that maximize the fit to the dated tree prior,
as the log likelihood for a tree with an empty matrix is always zero, and thus the posterior
probability is always exactly identical to the prior probability.
Overall, the Maximum A Posteriori tree is the "best" tree based
on the metric most directly considered by Bayesian analysis for
proposal acceptance, but the MCCT may be
the best tree for summarizing topological support.
In either case, point estimates of topology are often
problematic summaries for phylogenetic analyses.
Logical. If TRUE
, then nodes of the
output tree will be labeled with their respective posterior
probabilities, as calculated based on the frequency of a clade
occurring across the post-burn-in posterior tree sample. If FALSE
,
this is skipped.
If TRUE
, this function will also look for, scan, and parse an
associated NEXUS file. Ignoring any commented lines
(i.e., anything between a set of rectangular brackets []
),
commands for fixing taxa will be identified, parsed and returned to the user, either as a message
printed to the R console if output is written to a file, or as a attribute named 'fixed ages'
if output as an R object (formatted as a two-column
table of OTU names and their respective fixed ages).
Please note: the code for getFixedTimes = TRUE
contains a while()
loop in it for removing nested series of
square brackets (i.e. treated as comments in NEXUS files). Thus files with
ridiculously nested series of brackets may cause this code to take a while
to complete, or may even cause it to hang.
FALSE
by default.
If TRUE
, and getFixedTimes = TRUE
as well as file = NULL
(such that trees will be assigned within the R memory
rather than saved to an external file), the
functions setRootAge
and its wrapper function
setRootAges
will be applied to the output
so that all output trees have root.time
elements for use with other functions in paleotree
as well as other packages.
Filename (and possibly path too) to the original NEXUS file for this analysis.
Only tried if getFixedTimes = TRUE
. If NULL
(the default), then this function will
instead try to find a NEXUS file with the same name as implied by the filename used in other inputs. If
this file cannot be found, the function will fail.
Filename (possibly with path) as a character string
leading to a file which will be overwritten with the output trees (or summary tree),
as a NEXUS file. If NULL
(the default), the output will
instead be directly returned by this function.
David Bapst, with rescaling of raw output trees via code originally written by Nicholas Crouch.
This function is most useful for dealing with dating
analyses in MrBayes, particularly when tip-dating
a tree with fossil taxa, as the half-compatibility
and all-compatibility summary trees offered by the
'sumt
' command in MrBayes can have issues properly
portraying summary trees from such datasets.
When the arguments getFixedTimes = TRUE
and
setRootAges = TRUE
are used, the resulting output will be scaled to absolute time
with the available fixed ages using functions setRootAge
and setRootAges
(for single and multiple phylogenies).
This is only done if fixed ages are available and if the tree is not
being saved to an external file.
Maximum Clade Credibility trees are estimated using the function
maxCladeCred
in package phangorn.
See function link{tipDatingCompatabilitySummaryMrB}
for additional
ways of solely evaluating the topoligical information
in trees taken from MrBayes posterior samples.
if (FALSE) {
MCCT <- obtainDatedPosteriorTreesMrB(
runFile = "C:\\myTipDatingAnalysis\\MrB_run_fossil_05-10-17.nex.run1.t",
nRuns = 2,
burnin = 0.5,
outputTrees = "MCCT",
file = NULL)
MAP <- obtainDatedPosteriorTreesMrB(
runFile = "C:\\myTipDatingAnalysis\\MrB_run_fossil_05-10-17.nex.run1.t",
nRuns = 2,
burnin = 0.5,
getFixedTimes = TRUE,
outputTrees = "MAPosteriori",
file = NULL)
# get a root age from the fixed ages for tips
setRootAge(tree = MAP)
#pull a hundred trees randomly from the posterior
hundredRandomlySelectedTrees <- obtainDatedPosteriorTreesMrB(
runFile = "C:\\myTipDatingAnalysis\\MrB_run_fossil_05-10-17.nex.run1.t",
nRuns = 2,
burnin = 0.5,
getFixedTimes = TRUE,
getRootAges = TRUE,
outputTrees = 100,
file = NULL)
}
Run the code above in your browser using DataLab