predict.sm: Predict method for Smooth Model Fits

Description

predict method for class "sm".

Usage

# S3 method for sm
predict(object, newdata = NULL, se.fit = FALSE, 
        interval = c("none", "confidence", "prediction"),
        level = 0.95, type = c("response", "terms"), 
        terms = NULL, na.action = na.pass,
        intercept = NULL, combine = TRUE, design = FALSE, 
        check.newdata = TRUE, ...)

Value

Default use returns a vector of predictions. Otherwise the form of the output will depend on the combination of argumments: se.fit, interval, type, combine, and design.

type = "response":

When se.fit = FALSE and design = FALSE, the output will be the predictions (possibly with lwr and upr interval bounds). When se.fit = TRUE or design = TRUE, the output is a list with components fit, se.fit (if requested), and X (if requested).

type = "terms":

When se.fit = FALSE and design = FALSE, the output will be the predictions for each term (possibly with lwr and upr interval bounds). When se.fit = TRUE or design = TRUE, the output is a list with components fit, se.fit (if requested), and X (if requested).

Regardless of the type, setting combine = FALSE decomposes the requested result(s) into the parametric and smooth contributions.

Arguments

object: a fit from sm.
newdata: an optional list or data frame in which to look for variables with which to predict. If omitted, the original data are used.
se.fit: a switch indicating if standard errors are required.
interval: type of interval calculation. Can be abbreviated.
level: tolerance/confidence level.
type: type of prediction (response or model term). Can be abbreviated.
terms: which terms to include in the fit. The default of NULL uses all terms. This input is used regardless of the type of prediction.
na.action: function determining what should be done with missing values in newdata. The default is to predict NA.
intercept: a switch indicating if the intercept should be included in the prediction. If NULL (default), the intercept is included in the fit only when type = "r" and terms includes all model terms.
combine: a switch indicating if the parametric and smooth components of the prediction should be combined (default) or returned separately.
design: a switch indicating if the model (design) matrix for the prediction should be returned.
check.newdata: a switch indicating if the newdata should be checked for consistency (e.g., class and range). Ignored if newdata is not provided.
...: additional arguments affecting the prediction produced (currently ignored).

Author

Nathaniel E. Helwig <helwig@umn.edu>

Details

Inspired by the predict.lm function in R's stats package.

Produces predicted values, obtained by evaluating the regression function in the frame newdata (which defaults to model.frame(object)). If the logical se.fit is TRUE, standard errors of the predictions are calculated. Setting intervals specifies computation of confidence or prediction (tolerance) intervals at the specified level, sometimes referred to as narrow vs. wide intervals.

If newdata is omitted the predictions are based on the data used for the fit. Regardless of the newdata argument, how cases with missing values are handled is determined by the na.action argument. If na.action = na.omit omitted cases will not appear in the predictions, whereas if na.action = na.exclude they will appear (in predictions, standard errors or interval limits), with value NA.

Similar to the lm function, setting type = "terms" returns a matrix giving the predictions for each of the requested model terms. Unlike the lm function, this function allows for predictions using any subset of the model terms. Specifically, when type = "response" the predictions will only include the requested terms, which makes it possible to obtain estimates (and standard errors and intervals) for subsets of model terms. In this case, the newdata only needs to contain data for the subset of variables that are requested in terms.

References

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.lm.html

Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische Mathematik, 31, 377-403. tools:::Rd_expr_doi("10.1007/BF01404567")

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer. tools:::Rd_expr_doi("10.1007/978-1-4614-5369-7")

Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. tools:::Rd_expr_doi("10.4135/9781526421036885885")

Examples

Run this code

# generate data
set.seed(1)
n <- 100
x <- seq(0, 1, length.out = n)
z <- factor(sample(letters[1:3], size = n, replace = TRUE))
fun <- function(x, z){
  mu <- c(-2, 0, 2)
  zi <- as.integer(z)
  fx <- mu[zi] + 3 * x + sin(2 * pi * x + mu[zi]*pi/4)
}
fx <- fun(x, z)
y <- fx + rnorm(n, sd = 0.5)

# define marginal knots
probs <- seq(0, 0.9, by = 0.1)
knots <- list(x = quantile(x, probs = probs),
              z = letters[1:3])

# fit sm with specified knots
smod <- sm(y ~ x * z, knots = knots)

# get model "response" predictions
fit <- predict(smod)
mean((smod$fitted.values - fit)^2)

# get model "terms" predictions
trm <- predict(smod, type = "terms")
attr(trm, "constant")
head(trm)
mean((smod$fitted.values - rowSums(trm) - attr(trm, "constant"))^2)

# get predictions with "newdata" (= the original data)
fit <- predict(smod, newdata = data.frame(x = x, z = z))
mean((fit - smod$fitted.values)^2)

# get predictions and standard errors
fit <- predict(smod, se.fit = TRUE)
mean((fit$fit - smod$fitted.values)^2)
mean((fit$se.fit - smod$se.fit)^2)

# get 99% confidence interval
fit <- predict(smod, interval = "c", level = 0.99)
head(fit)

# get 99% prediction interval
fit <- predict(smod, interval = "p", level = 0.99)
head(fit)

# get predictions only for x main effect
fit <- predict(smod, newdata = data.frame(x = x), 
               se.fit = TRUE, terms = "x")
plotci(x, fit$fit, fit$se.fit)

# get predictions only for each group
fit.a <- predict(smod, newdata = data.frame(x = x, z = "a"), se.fit = TRUE)
fit.b <- predict(smod, newdata = data.frame(x = x, z = "b"), se.fit = TRUE)
fit.c <- predict(smod, newdata = data.frame(x = x, z = "c"), se.fit = TRUE)

# plot results (truth as dashed line)
plotci(x = x, y = fit.a$fit, se = fit.a$se.fit,
       col = "red", col.ci = "pink", ylim = c(-6, 6))
lines(x, fun(x, rep(1, n)), lty = 2, col = "red")
plotci(x = x, y = fit.b$fit, se = fit.b$se.fit,
       col = "blue", col.ci = "cyan", add = TRUE)
lines(x, fun(x, rep(2, n)), lty = 2, col = "blue")
plotci(x = x, y = fit.c$fit, se = fit.c$se.fit,
       col = "darkgreen", col.ci = "lightgreen", add = TRUE)
lines(x, fun(x, rep(3, n)), lty = 2, col = "darkgreen")

# add legends
legend("bottomleft", legend = c("Truth", "Estimate", "CI"), 
       lty = c(2, 1, NA), lwd = c(1, 2, NA), 
       col = c("black", "black","gray80"),
       pch = c(NA, NA, 15), pt.cex = 2, bty = "n")
legend("bottomright", legend = letters[1:3], 
       lwd = 2, col = c("red", "blue", "darkgreen"), bty = "n")

Run the code above in your browser using DataLab