update.HLfit: Updates a fit

Description

update and update_resp will update and (by default) re-fit a model. They do this mostly by extracting the call stored in the object, updating the call and evaluating that call. Using update(<fit>) is a risky programming style (see Details). update_formulas(<mv fit>, ...) can update formulas from a fitmv fit as well as the single formula of a fit by the other fitting functions.

update_resp handles a new response vector as produced by simulate.

Usage

# S3 method for HLfit
update(object, formula., ..., evaluate = TRUE)
update_resp(object, newresp, ..., evaluate = TRUE)
update_formulas(object, formula., ...)
# $respName[s] : see Details.

Value

update.formula(object) returns an object of the same nature as formula(object). The other functions and methods return an HLfit fit of the same type as the input object, or a call object, depending on the evaluate value. Warning: The object returned by update_resp cannot be used safely for further programming, for the reason explained in the Details section.

Arguments

object: A return object from an HLfit call.
formula.: A standard formula; or a formula with a peculiar syntax only describing changes to the original model formula (see update.formula for details); or (for multivariate-response models) a list of formula of such types.
newresp: New response vector.
...: Additional arguments to the call, or arguments with changed values. Use name = NULL to remove the argument with given name.
evaluate: If TRUE, evaluate the new call else return the call.

Details

Controlling response updating: Updating the data may be tricky when the response specified in a formula is not simply the name of a variable in the data. For example, if the response was specified as I(foo^2) the variable foo is not what simulate.HLfit will simulate, so foo should not be updated with such simulation results, yet this is what should be updated in the data. For some time spaMM has handled such cases by using an alternative way to provide updated response information, but this has some limitations. So spaMM now update the data after checking that this is correct, which the consequence that when response updating is needed (notably, for bootstrap procedures), the response should preferably be specified as the name of a variable in the data, rather than a more complicated expression.

However, in some cases, dynamic evaluation of the response variable may be helpful. For example, for bootstrapping hurdle models, the zero-truncated response may be specified as I(count[presence>0] <- NA; count) (where both the zero-truncated count and binary presence variables are both updated by the bootstrap simulation). In that case the names of the two variables to be updated is provided by setting (say)
<fit object>$respNames <- c("presence", "count")
for an hurdle model fit as a bivariate-response model, with first submodel for presence/absence, and second submodel for zero-truncated response. A full example is developed in the “Gentle introduction” to spaMM ( https://gitlab.mbb.univ-montp2.fr/francois/spamm-ref/-/blob/master/vignettePlus/spaMMintro.pdf). Alternatively for univariate-response fits, use
<fit object>$respName <- "count"

Controlling formula updating: Early versions of spaMM's update method relied on stats::update.formula whose results endorse stats's (sometimes annoying) convention that a formula without an explicit intercept term actually includes an intercept. spaMM::update.HLfit was then defined to avoid this problem. Formula updates should still be carefully checked, as getting them perfect has not been on the priority list.

Various post-fit functions from base R may use update.formula directly, rather than using automatic method selection for update. update.formula is not itself a generic, which leads to the following problem. To make update.formula() work on multivariate-response fits, one would like to be able to redefine it as a generic, with an HLfit method that would perform what update_formulas does, but such a redefinition appears to be forbidden in a package distributed on CRAN. Instead it is suggested to define a new generic spaMM::update, which could have a spaMM::update.formula as a method (possibly itself a generic). This would be of limited interest as the new spaMM::update.formula would be visible to spaMM::update but not to stats::update, and thus the post-fit functions from base R would still not use this method.

Safe updating: update(<fit>, ...), as a general rule, is tricky. update methods are easily affected in a non-transparent way by changes in variables used in the original call. For example


foo <- rep(1,10)
m <- lm(rnorm(10)~1, weights=foo)
rm(foo)
update(m, .~.) # Error

To avoid such problems, spaMM tries to avoid references to variables in the global environment, by enforcing that the data are explicitly provided to the fitting functions by the data argument, and that any variable used in the prior.weights argument is in the data.

Bugs can also result when calling update on a fit produced within some function, say function somefn calling fitme(data=mydata,...), as e.g. update(<fit>) will then seek a global variable mydata that may differ from the fitted mydata which was local to somefn.

Examples

Run this code

data("wafers")
## First the fit to be updated:
wFit <- HLfit(y ~X1*X3+X2*X3+I(X2^2)+(1|batch),family=Gamma(log),
          resid.model = ~ X3+I(X3^2) ,data=wafers)

newresp <- simulate(wFit)
update_resp(wFit,newresp=newresp)

# For estimates given by Lee et al., Appl. Stochastic Models Bus. Ind. (2011) 27:  315-328:
# Refit with given beta or/and phi values:
 
betavals <- c(5.55,0.08,-0.14,-0.21,-0.08,-0.09,-0.09)
# reconstruct fitted phi value from predictor for log(phi)
Xphi <- with(wafers,cbind(1,X3,X3^2)) ## design matrix
phifit <- exp(Xphi %*% c(-2.90,0.1,0.95))
upd_wafers <- wafers
designX <- get_matrix(wFit)
upd_wafers$off_b <- designX %*% betavals
update(wFit,formula.= . ~ offset(off_b)+(1|batch), data=upd_wafers,
       ranFix=list(lambda=exp(-3.67),phi=phifit))

## There are subtlety in performing REML fits of constrained models,
##   illustrated by the fact that the following fit does not recover 
##   the original likelihood values, because dispersion parameters are
##   estimated but the REML correction changes with the formula:
upd_wafers$off_f <- designX %*% fixef(wFit) ## = predict(wFit,re.form=NA,type="link")
update(wFit,formula.= . ~ offset(off_f)+(1|batch), data=upd_wafers)
#
## To maintain the original REML correction, Consider instead
update(wFit,formula.= . ~ offset(off_f)+(1|batch), data=upd_wafers,
       REMLformula=formula(wFit))  ## recover original p_v and p_bv     
## Alternatively, show original wFit as differences from betavals:  
update(wFit,formula.= . ~ . +offset(off_f), data=upd_wafers)

Run the code above in your browser using DataLab