Learn R Programming

npreg (version 1.1.0)

varimp: Variable Importance Indices

Description

Computes variable importance indices for terms of a smooth model.

Usage

varimp(object, newdata = NULL, combine = TRUE)

Value

If combine = TRUE, returns a named vector containing the importance indices for each effect function (in object$terms).

If combine = FALSE, returns a data frame where the first column gives the importance indices for the parametric components and the second column gives the importance indices for the smooth (nonparametric) components.

Arguments

object

an object of class "sm" output by the sm function or an object of class "gsm" output by the gsm function.

newdata

the data used for variable importance calculation (if NULL training data are used).

combine

a switch indicating if the parametric and smooth components of the importance should be combined (default) or returned separately.

Author

Nathaniel E. Helwig <helwig@umn.edu>

Details

Suppose that the function can be written as $$\eta = \eta_0 + \eta_1 + \eta_2 + ... + \eta_p$$ where \(\eta_0\) is a constant (intercept) term, and \(\eta_j\) denotes the \(j\)-th effect function, which is assumed to have mean zero. Note that \(\eta_j\) could be a main or interaction effect function for all \(j = 1, ..., p\).

The variable importance index for the \(j\)-th effect term is defined as $$\pi_j = (\eta_j^\top \eta_*) / (\eta_*^\top \eta_*)$$ where \(\eta_* = \eta_1 + \eta_2 + ... + \eta_p\). Note that \(\sum_{j = 1}^p \pi_j = 1\) but there is no guarantee that \(\pi_j > 0\).

If all \(\pi_j\) are non-negative, then \(\pi_j\) gives the proportion of the model's R-squared that can be accounted for by the \(j\)-th effect term. Thus, values of \(\pi_j\) closer to 1 indicate that \(\eta_j\) is more important, whereas values of \(\pi_j\) closer to 0 (including negative values) indicate that \(\eta_j\) is less important.

References

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer. tools:::Rd_expr_doi("10.1007/978-1-4614-5369-7")

Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. tools:::Rd_expr_doi("10.4135/9781526421036885885")

See Also

See summary.sm for more thorough summaries of smooth models.

See summary.gsm for more thorough summaries of generalized smooth models.

Examples

Run this code

##########   EXAMPLE 1   ##########
### 1 continuous and 1 nominal predictor

# generate data
set.seed(1)
n <- 100
x <- seq(0, 1, length.out = n)
z <- factor(sample(letters[1:3], size = n, replace = TRUE))
fun <- function(x, z){
  mu <- c(-2, 0, 2)
  zi <- as.integer(z)
  fx <- mu[zi] + 3 * x + sin(2 * pi * x)
}
fx <- fun(x, z)
y <- fx + rnorm(n, sd = 0.5)

# define marginal knots
probs <- seq(0, 0.9, by = 0.1)
knots <- list(x = quantile(x, probs = probs),
              z = letters[1:3])

# fit correct (additive) model
sm.add <- sm(y ~ x + z, knots = knots)

# fit incorrect (interaction) model
sm.int <- sm(y ~ x * z, knots = knots)

# true importance indices
eff <- data.frame(x = 3 * x + sin(2 * pi * x), z = c(-2, 0, 2)[as.integer(z)])
eff <- scale(eff, scale = FALSE)
fstar <- rowSums(eff)
colSums(eff * fstar) / sum(fstar^2)

# estimated importance indices
varimp(sm.add)
varimp(sm.int)



##########   EXAMPLE 2   ##########
### 4 continuous predictors
### additive model

# generate data
set.seed(1)
n <- 100
fun <- function(x){
  sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(4*pi*x[,4])
}
data <- as.data.frame(replicate(4, runif(n)))
colnames(data) <- c("x1v", "x2v", "x3v", "x4v")
fx <- fun(data)
y <- fx + rnorm(n)

# define ssa knot indices
knots.indx <- c(bin.sample(data$x1v, nbin = 10, index.return = TRUE)$ix,
                bin.sample(data$x2v, nbin = 10, index.return = TRUE)$ix,
                bin.sample(data$x3v, nbin = 10, index.return = TRUE)$ix,
                bin.sample(data$x4v, nbin = 10, index.return = TRUE)$ix)

# fit correct (additive) model
sm.add <- sm(y ~ x1v + x2v + x3v + x4v, data = data, knots = knots.indx)

# fit incorrect (interaction) model
sm.int <- sm(y ~ x1v * x2v + x3v + x4v, data = data, knots = knots.indx)

# true importance indices
eff <- data.frame(x1v = sin(pi*data[,1]), x2v = sin(2*pi*data[,2]),
                  x3v = sin(3*pi*data[,3]), x4v = sin(4*pi*data[,4]))
eff <- scale(eff, scale = FALSE)
fstar <- rowSums(eff)
colSums(eff * fstar) / sum(fstar^2)

# estimated importance indices
varimp(sm.add)
varimp(sm.int)

Run the code above in your browser using DataLab