simulate.HLfit: Simulate realizations of a fitted model.

Description

From an HLfit object, simulate.HLfit function generates new samples given the estimated fixed effects and dispersion parameters. Simulation may be unconditional (the default, useful in many applications of parametric bootstrap), or conditional on the predicted values of random effects, or may draw from the conditional distribution of random effects given the observed response. Simulations may be run for the original sampling design (i.e., original values of fixed-effect predictor variables and of random-effect levels, including spatial locations if relevant), or for a new design as specified by the newdata argument.

simulate4boot is a wrapper around simulate.HLfit that can be used to precompute the bootstrap samples to be used by spaMM_boot or spaMM2boot through their boot_samples argument (and is called internally by these functions when boot_samples is NULL).

simulate_ranef will only simulate and return a vector of random effects, more specifically some elements of the b vector appearing in the standard form offset+ X$\beta$ + Z b for the linear predictor.

Usage

# S3 method for HLfit
simulate(object, nsim = 1, seed = NULL, newdata = NULL, 
                         type = "marginal", re.form, conditional = NULL, 
                         verbose = c(type=TRUE,
                                     showpbar=eval(spaMM.getOption("barstyle"))), 
                         sizes = if (is.null(newdata)) object$BinomialDen, 
                         resp_testfn = NULL, phi_type = "predict", 
                         prior.weights = if (is.null(newdata)) object$prior.weights, 
                         variances=list(), ...)
# S3 method for HLfitlist
simulate(object, nsim = 1, seed = NULL, 
                             newdata = object[[1]]$data, sizes = NULL, ...)
simulate4boot(object, nsim, seed=NULL, resp_testfn=NULL, type="marginal", 
              showpbar=eval(spaMM.getOption("barstyle")), ...)
simulate_ranef(object, which=NULL, newdata = NULL, nsim = 1L)

Value

simulate.HLfit returns a vector (if nsim=1) or a matrix with nsim columns, each containing simulated responses (or simulated random effects, for simulated_ranef()). For multivariate-response simulations, an nobs attribute gives the number of responses for each submodel if no resp_testfn was applied.

simulate4boot returns a list with elements

bootreps: the result of simulate.HLfit as a matrix with nsim columns;
RNGstate: the state of .Random.seed at the beginning of the sample simulation.

The simulate.HLfitlist method returns a list of simulated responses.

Arguments

object: The return object of HLfit or similar function.
nsim: number of response vectors to simulate. Defaults to '1'.
seed: A seed for set.seed. If such a value is provided, the initial state of the random number generator at a global level is restored on exit from simulate.
newdata: A data frame closely matching the original data, except that response values are not needed. May provide new values of fixed predictor variables, new spatial locations, new individuals within a block, or new values of the LHS in random-effect terms of the form (<LHS>|<RHS>).
prior.weights: Prior weights that may be substituted to those of the original fit, with the same effect on the residual variance. See Details for the definition of the default when newdata are provided. For multivariate-response fits, this is a list with one element per submodel, each element being a vector whose size is the number of response levels to be simulated for each submodel (the object$prior.weights provides an example).
sizes: A vector of sample sizes to simulate in the case of a binomial or betabin fit. See Details for the definition of the default when newdata are provided. For multivariate-response fits, the sizes argument should contain elements for response levels for all submodels whatever their response families (e.g. for submodels with families and response levels poisson: 3 and binomial: 2, respectively, the sizes vector should contain 5 elements, e.g. 1 1 1 5 10, only the last two of which having nontrivial meaning).
re.form: formula for random effects to condition on. Default behaviour depends on the type argument. The joint default is the latter's default, i.e., unconditional simulation. re.form is currently ignored when type="predVar" (with a warning). Otherwise, re.form=NULL conditions on all random effects (as type="residual" does), and re.form=NA conditions on none of the random effects (as type="marginal" or re.form=~0 do).
type: character string specifying which uncertainties are taken into account in the linear predictor and notably in the random effect terms. Whatever the type, the residual variance is always accounted in the simulation output. "marginal" accounts for the marginal variance of the random effect (and, by default, also for the uncertainty in fixed effects); "predVar" accounts for the conditional distribution of the random effects given the data (see Details); and "residual" should perhaps be "none" as no uncertainty is accounted in the linear predictor: the simulation variance is only the residual variance of the fitted model.
conditional: Obsolete and will be deprecated. Boolean; TRUE and FALSE are equivalent to type="residual" and type="marginal", respectively.
verbose: Either a single boolean (which determines verbose[["type"]], or a vector of booleans with possible elements "type" (to display basic information about the type of simulation) and "showpbar" (see predict(.,verbose)).
resp_testfn: NULL, or a function that tests a condition which simulated samples should satisfy. This function takes a response vector as argument and return a boolean (TRUE indicating that the sample satisfies the condition).
phi_type: Character string, either "predict" or one of the values possible for type. This controls the residual variance parameter $\phi$. The default is to use predicted $\phi$ values from the fit, which are the fitted $\phi$ values except when a structured-dispersion model is involved together with non-NULL newdata. However, when a structured-dispersion model is involved, it is also possible to simulate new $\phi$ values, and for a mixed-effects structured-dispersion model, the same types of simulation controlled by type for the mean response can be performed as controlled by phi_type. For a fixed-effects structured-dispersion model, these types cannot be distinguished, and any phi_type distinct from "predict" will imply simulation under the fixed-effect model (see Examples).
variances: Used when type="predVar": see Details.
...: For simulate4boot, further arguments passed to simulate.HLfit (e.g., newdata). For simulate.HLfit, further arguments only passed to predict in a speculative bit of code (see Details).
which: Integer, or integer vector: the random effect(s) (indexed as ordered as in the model formula) to be simulated. If NULL, all of them are simulated.
showpbar: Controls display of progress bar. See barstyle option for details.

Details

type="predVar" accounts for the uncertainty of the linear predictor, by drawing new values of the predictor in a multivariate gaussian distribution with mean and covariance matrix of prediction. In this case, the user has to provide a variances argument, passed to predict, which controls what goes into this covariance matrix. For example, with variances=list(linPred=TRUE,disp=TRUE)), the covariance matrix takes into account the joint uncertainty in the fixed-effect coefficients and of any random effects given the response and the point estimates of dispersion and correlation parameters ("linPred" variance component), and in addition accounts for uncertainty in the dispersion parameters (effect of "disp" variance component as further described in predict.HLfit). The total simulation variance is then the response variance. Uncertainty in correlation parameters (such a parameters of the Matern family) is not taken into account. The "linPred" uncertainty is known exactly in LMMs, and otherwise approximated as a Gaussian distribution with mean vector and covariance matrix given as per the Laplace approximation.

type="(ranef|response)" can be viewed as a special version of type="predVar" where
variances=list(linPred=TRUE,disp=FALSE)) and only the uncertainty in the random effects is taken into account.

A full discussion of the merits of the different types is beyond the scope of this documentation, but these different types may not all be useful. type="marginal" is typically used for computation of confidence intervals by parametric bootstrap methods. type="residual" is used by get_cPredVar for its evaluation of a bias term. The other types may be used to simulate the uncertainty in the random effects, conditionally on the data, and may therefore be more akin to the computation of prediction intervals conditionally on an (unknown but inferred) realization of the random effects. But these should presumably not be used in a bootstrap computation of such intervals, as this would represent a double accounting of the uncertainty that the bootstrap aims to quantify.

There are cases where simulation without a newdata argument may give results of different length than simulation with newdata=<original data>, as for predict.

When newdata are provided but new values of prior.weights or sizes are missing, new values of these missing arguments are guessed, and warnings may be issued depending on the kind of guess made for response families dependent on such arguments. The prior.weights values used in the fit are re-used without warning when such values were identical (generally, unit) for all response values, and labelled as such in the object$prior.weights. Unit weights will be used otherwise, with a warning. Unit binomial sizes will be used, with a warning, whenever there are newdata.

Examples

Run this code

data("Loaloa")
HLC <- HLCor(cbind(npos,ntot-npos)~Matern(1|longitude+latitude),
           data=Loaloa,family=binomial(),
           ranPars=list(lambda=1,nu=0.5,rho=1/0.7)) 
simulate(HLC,nsim=2)

## Structured dispersion model 
data("wafers")
hl <- HLfit(y ~X1+X2+X1*X3+X2*X3+I(X2^2)+(1|batch),family=Gamma(log),
            resid.model = ~ X3+I(X3^2) ,data=wafers)
simulate(hl,type="marginal",phi_type="simulate",nsim=2)
simulate_ranef(hl,nsim=2)

Run the code above in your browser using DataLab