The bootstrap datasets are computed by resampling independent pairs of
observations at successive times (for non-hidden models without censoring),
or independent individual series (for hidden models or models with
censoring). Therefore this approach doesn't work if, for example, the data
for a HMM consist of a series of observations from just one individual, and
is inaccurate for small numbers of independent transitions or individuals.
Confidence intervals or standard errors for the corresponding statistic can
be calculated by summarising the returned list of B
replicated
outputs. This is currently implemented for most the output functions
qmatrix.msm
, ematrix.msm
,
qratio.msm
, pmatrix.msm
,
pmatrix.piecewise.msm
, totlos.msm
and
prevalence.msm
. For other outputs, users will have to write
their own code to summarise the output of boot.msm
.
Most of msm's output functions present confidence intervals based on
asymptotic standard errors calculated from the Hessian. These are expected
to be underestimates of the true standard errors (Cramer-Rao lower bound).
Some of these functions use a further approximation, the delta method (see
deltamethod
) to obtain standard errors of transformed
parameters. Bootstrapping should give a more accurate estimate of the
uncertainty.
An alternative method which is less accurate though faster than
bootstrapping, but more accurate than the delta method, is to draw a sample
from the asymptotic multivariate normal distribution implied by the maximum
likelihood estimates (and covariance matrix), and summarise the transformed
estimates. See pmatrix.msm
.
All objects used in the original call to msm
which produced
x
, such as the qmatrix
, should be in the working environment,
or else boot.msm
will produce an “object not found” error.
This enables boot.msm
to refit the original model to the replicate
datasets. However there is currently a limitation. In the original call to
msm
, the "formula"
argument should be specified directly, as,
for example,
msm(state ~ time, data = ...)
and not, for example,
form = data$state ~ data$time
msm(formula=form, data = ...)
otherwise boot.msm
will be unable to draw the replicate datasets.
boot.msm
will also fail with an incomprehensible error if the
original call to msm used a used-defined object whose name is the same as a
built-in R object, or an object in any other loaded package. For example,
if you have called a Q matrix q
, when q()
is the built-in
function for quitting R.
If stat
is NULL
, then B
different msm
model
objects will be stored in memory. This is unadvisable, as msm
objects
tend to be large, since they contain the original data used for the
msm
fit, so this will be wasteful of memory.
To specify more than one statistic, write a function consisting of a list of
different function calls, for example,
stat = function(x) list (pmatrix.msm(x, t=1), pmatrix.msm(x, t=2))