The primary purpose of the predict method is to explore marginal effects of covariates. The uncertainty present in these predictions refers to uncertainty in the expected value of the model. The expectation does not include the error term of the model (nb: one expects actual observations to form a cloud of points around the expected value). By contrast, posterior_predict returns the complete (posterior) predictive distribution of the model (the expectation plus noise).
The model formula will be taken from object$formula
, and then a model matrix will be created by passing newdata
to the model.frame function (as in: model.frame(object$formula, newdata)
. Parameters are taken from as.matrix(object, pars = c("intercept", "beta"))
.
The examples illustrate how to use the function in most cases.
Special considerations apply to models with spatially-lagged covariates and a spatially-lagged dependent variable (i.e., the 'SLM' and 'SDLM' models fit by stan_sar).
Spatial lag of X
Spatially-lagged covariates which were included via the slx
argument will, by default, not be included in the predicted values. (The user can have greater control by manually adding the spatially-lagged covariate to the main model formula.) The slx
term will be be included in predictions if add_slx = TRUE
or if the fitted model is a SAR model of type 'SLM' or 'SDLM'. In either of those cases, newdata
must have the same number of rows as were used to fit the original data.
Spatial lag of Y
The typical 'marginal effect' interpretation of the regression coefficients does not hold for the SAR models of type 'SLM' or 'SDLM'. For details on these 'spillover effects', see LeSage and Pace (2009), LeSage (2014), and impacts.
Predictions for the spatial lag model (SAR models of type 'SLM') are equal to:
$$
(I - \rho W)^{-1} X \beta
$$
where \(X \beta\) contains the intercept and covariates. Predictions for the spatial Durbin lag model (SAR models of type 'SDLM') are equal to:
$$
(I - \rho W)^{-1} (X \beta + WX \gamma)
$$
where \(WX \gamma\) are spatially lagged covariates multiplied by their coefficients. For this reason, the predict.geostan_fit
method requires that newdata
have as many rows as the original data (so that nrow(newdata) == nrow(object$C)
); the spatial weights matrix will be taken from object$C
.
The inverse of the matrix \((I - \rho W)\) can be time consuming to compute (especially when iterating over MCMC samples). You can use approx = TRUE
to approximate the inverse using a series of matrix powers. The argument \(K\) controls how many powers to use for the approximation. As a rule, higher values of \(\rho\) require larger \(K\) to obtain accuracy. Notice that \(\rho^K\) should be close to zero for the approximation to hold. For example, for \(\rho = .5\) a value of \(K=8\) may suffice (\(0.5^8 = 0.004\)), but larger values of \(\rho\) require higher values of \(K\).
Generalized linear models
In generalized linear models (such as Poisson and Binomial models) marginal effects plots on the response scale may be sensitive to the level of other covariates in the model and to geographic location (given a spatially-varying mean value). If the model includes a spatial autocorrelation component (for example, you used a spatial CAR, SAR, or ESF model, or used the re
argument for random effects), by default these terms will be fixed at zero for the purposes of calculating marginal effects. If you want to change this, you can introduce a varying intercept manually using the alpha
argument.