Controlling response updating: Updating the data may be tricky when the response specified in a formula is not simply the name of a variable in the data. For example,
if the response was specified as I(foo^2)
the variable foo
is not what simulate.HLfit
will simulate, so foo
should not be updated with such simulation results, yet this is what should be updated in the data
. For some time spaMM has handled such cases by using an alternative way to provide updated response information, but this has some limitations. So spaMM now update the data after checking that this is correct, which the consequence that when response updating is needed (notably, for bootstrap procedures), the response should preferably be specified as the name of a variable in the data, rather than a more complicated expression.
However, in some cases, dynamic evaluation of the response variable may be helpful. For example, for bootstrapping hurdle models, the zero-truncated response may be specified as I(count[presence>0] <- NA; count)
(where both the zero-truncated count
and binary presence
variables are both updated by the bootstrap simulation). In that case the names of the two variables to be updated is provided by setting (say)
<fit object>$respNames <- c("presence", "count")
for an hurdle model fit as a bivariate-response model, with first submodel for presence/absence, and second submodel for zero-truncated response. A full example is developed in the “Gentle introduction” to spaMM (
https://gitlab.mbb.univ-montp2.fr/francois/spamm-ref/-/blob/master/vignettePlus/spaMMintro.pdf).
Alternatively for univariate-response fits, use
<fit object>$respName <- "count"
Controlling formula updating: Early versions of spaMM's update
method relied on stats::update.formula
whose results endorse stats
's (sometimes annoying) convention that a formula without an explicit intercept term actually includes an intercept. spaMM::update.HLfit
was then defined to avoid this problem. Formula updates should still be carefully checked, as getting them perfect has not been on the priority list.
Various post-fit functions from base R may use update.formula
directly, rather than using automatic method selection for update
. update.formula
is not itself a generic, which leads to the following problem. To make update.formula()
work on multivariate-response fits, one would like to be able to redefine it as a generic, with an HLfit
method that would perform what update_formulas
does, but such a redefinition appears to be forbidden in a package distributed on CRAN. Instead it is suggested to define a new generic spaMM::update
, which could have a spaMM::update.formula
as a method (possibly itself a generic). This would be of limited interest as the new spaMM::update.formula
would be visible to spaMM::update
but not to stats::update
, and thus the post-fit functions from base R would still not use this method.
Safe updating: update(<fit>, ...)
, as a general rule, is tricky. update
methods are easily affected in a non-transparent way by changes in variables used in the original call. For example
foo <- rep(1,10)
m <- lm(rnorm(10)~1, weights=foo)
rm(foo)
update(m, .~.) # Error
To avoid such problems, spaMM tries to avoid references to variables in the global environment, by enforcing that the data are explicitly provided to the fitting functions by the data
argument, and that any variable used in the prior.weights
argument is in the data.
Bugs can also result when calling update
on a fit produced within some function, say function somefn
calling fitme(data=mydata,...)
, as e.g. update(<fit>)
will then seek a global variable mydata
that may differ from the fitted mydata
which was local to somefn
.