sail: Fit Sparse Additive Interaction Model with Strong Heredity

Description

Function to fit the Sparse Additive Interaction Model with strong heredity for a sequence of tuning parameters. This is a penalized regression method that ensures the interaction term is non-zero only if its corresponding main-effects are non-zero. This model only considers the interactions between a single exposure (E) variable and a high-dimensional matrix (X). Additive (non-linear) main effects and interactions can be specified by the user. This can also be seen as a varying-coefficient model.

Usage

sail(x, y, e, basis = function(i) splines::bs(i, df = 5),
  strong = TRUE, group.penalty = c("gglasso", "grMCP", "grSCAD"),
  family = c("gaussian", "binomial"), center.x = TRUE,
  center.e = TRUE, expand = TRUE, group, weights,
  penalty.factor = rep(1, 1 + 2 * nvars), lambda.factor = ifelse(nobs <
  (1 + 2 * bscols * nvars), 0.01, 1e-04), lambda = NULL, alpha = 0.5,
  nlambda = 100, thresh = 1e-04, fdev = 1e-05, maxit = 1000,
  dfmax = 2 * nvars + 1, verbose = 0)

Arguments

input matrix of dimension n x p, where n is the number of subjects and p is number of X variables. Each row is an observation vector. Can be a high-dimensional (n < p) matrix. Can be a user defined design matrix of main effects only (without intercept) if expand=FALSE

response variable. For family="gaussian" should be a 1 column matrix or numeric vector. For family="binomial", should be a 1 column matrix or numeric vector with -1 for failure and 1 for success.

exposure or environment vector. Must be a numeric vector. Factors must be converted to numeric.

basis

user defined basis expansion function. This function will be applied to every column in x. Specify function(i) i if no expansion is desired. Default: function(i) splines::bs(i, df = 5).

strong

Flag specifying strong hierarchy (TRUE) or weak hierarchy (FALSE). Default FALSE.

group.penalty

group lasso penalty. Can be one of "gglasso" (group lasso), "grMCP" (group MCP) or "grSCAD" (group SCAD). See references for details. Default: "gglasso".

family

response type. See y for details. Currently only family = "gaussian" is implemented. Default: "gaussian".

center.x

should the columns of x (after basis expansion) be centered (logical). Default: TRUE.

center.e

should exposure variable e be centered. Default: TRUE.

expand

should basis be applied to every column of x (logical). Set to FALSE if you want a user defined main effects design matrix. If FALSE the group membership argument must also be supplied. Default: TRUE.

group

a vector of consecutive integers, starting from 1, describing the grouping of the coefficients. Only required when expand=FALSE.

weights

observation weights. Default is 1 for each observation. Currently NOT IMPLEMENTED.

penalty.factor

separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables. Must be of length 1 + 2*ncol(x) where the first entry corresponds to the penalty.factor for e, the next ncol(x) entries correspond to main effects, and the following ncol(x) entries correspond to the interactions.

lambda.factor

the factor for getting the minimal lambda in the lambda sequence, where min(lambda) = lambda.factor * max(lambda). max(lambda) is the smallest value of lambda for which all coefficients are zero. The default depends on the relationship between N (the number of rows in the matrix of predictors) and q (the total number of predictors in the design matrix - including interactions). If N > q, the default is 1e-4, close to zero. If N < p, the default is 0.01. A very small value of lambda.factor will lead to a saturated fit.

lambda

a user supplied lambda sequence. Typically, by leaving this option unspecified users can have the program compute its own lambda sequence based on nlambda and lambda.factor. Supplying a value of lambda overrides this. It is better to supply a decreasing sequence of lambda values than a single (small) value, if not, the program will sort user-defined lambda sequence in decreasing order automatically. Default: NULL.

alpha

the mixing tuning parameter, with $0<\alpha<1$. It controls the penalization strength between the main effects and the interactions. The penalty is defined as $$\lambda(1-\alpha)(w_e|\beta_e|+ \sum w_j ||\beta_j||_2) + \lambda\alpha(\sum w_{je} |\gamma_j|)$$Larger values of alpha will favor selection of main effects over interactions. Smaller values of alpha will allow more interactions to enter the final model. Default: 0.5

nlambda

the number of lambda values. Default: 100

thresh

convergence threshold for coordinate descent. Each coordinate-descent loop continues until the change in the objective function after all coefficient updates is less than thresh. Default: 1e-04.

fdev

minimum fractional change in deviance for stopping path. Default: 1e-5.

maxit

maximum number of outer-loop iterations allowed at fixed lambda value. If models do not converge, consider increasing maxit. Default: 1000.

dfmax

limit the maximum number of variables in the model. Useful for very large q (the total number of predictors in the design matrix - including interactions), if a partial path is desired. Default: 2 * p + 1 where p is the number of columns in x.

verbose

display progress. Can be either 0,1 or 2. 0 will not display any progress, 2 will display very detailed progress and 1 is somewhere in between. Default: 1.

Value

an object with S3 class "sail", "*", where "*" is "lspath" or "logitreg". Results are provided for converged values of lambda only.

a0: intercept sequence of length nlambda
beta: a (# main effects after basis expansion x nlambda) matrix of main effects coefficients, stored in sparse column format ("dgCMatrix")
alpha: a (# interaction effects after basis expansion x nlambda) matrix of interaction effects coefficients, stored in sparse column format ("dgCMatrix")
gamma: A p x nlambda matrix of ($\gamma$) coefficients, stored in sparse column format ("dgCMatrix")
bE: exposure effect estimates of length nlambda
active: list of length nlambda containing character vector of selected variables
lambda: the actual sequence of lambda values used
lambda2: value for the mixing tuning parameter $0<\alpha<1$
dfbeta: the number of nonzero main effect coefficients for each value of lambda
dfalpha: the number of nonzero interaction coefficients for each value of lambda
dfenviron: the number of nonzero exposure (e) coefficients for each value of lambda
dev.ratio: the fraction of (null) deviance explained (for "lspath", this is the R-square). The deviance calculations incorporate weights if present in the model. The deviance is defined to be 2*(loglike_sat - loglike), where loglike_sat is the log-likelihood for the saturated model (a model with a free parameter per observation). Hence dev.ratio=1-dev/nulldev.
converged: vector of logicals of length nlambda indicating if the algorithm converged
nlambda: number of converged lambdas
design: design matrix (X, E, X:E), of dimension n x (2*ncols*p+1) if expand=TRUE. This is used in the predict method.
nobs: number of observations
nvars: number of main effect variables
vnames: character of variable names for main effects (without expansion)
ncols: an integer of basis for each column of x if expand=TRUE, or an integer vector of basis for each variable if expand=FALSE
center.x: were the columns of x (after expansion) centered?
center.e: was e centered?
basis: user defined basis expansion function
expand: was the basis function applied to each column of x?
group: a vector of consecutive integers describing the grouping of the coefficients. Only if expand=FALSE
interaction.names: character vector of names of interaction variables
main.effect.names: character vector of names of main effect variables (with expansion)

Details

The objective function for family="gaussian" is $$RSS/2n + \lambda(1-\alpha)(w_e|\beta_e|+ \sum w_j ||\beta_j||_2) + \lambda\alpha(\sum w_{je} |\gamma_j|)$$ where RSS is the residual sum of squares and n is the number of observations. See Bhatnagar et al. (2018+) for details.

It is highly recommended to specify center.x = TRUE and center.e = TRUE for both convergence and interpretation reasons. If centered, the final estimates can be interpreted as the effect of the predictor on the response while holding all other predictors at their mean value. For computing speed reasons, if models are not converging or running slow, consider increasing thresh, decreasing nlambda, or increasing lambda.factor before increasing maxit. Then try increasing the value of alpha (which translates to more penalization on the interactions).

By default, sail uses the group lasso penalty on the basis expansions of x. To use the group MCP and group SCAD penalties (see Breheny and Huang 2015), the grpreg package must be installed.

References

Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. http://www.jstatsoft.org/v33/i01/.

Breheny P and Huang J (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25: 173-187.

Yang Y, Zou H (2015). A fast unified algorithm for solving group-lasso penalize learning problems. Statistics and Computing. Nov 1;25(6):1129-41. http://www.math.mcgill.ca/yyang/resources/papers/STCO_gglasso.pdf

Bhatnagar SR, Yang Y, Greenwood CMT. Sparse additive interaction models with the strong heredity property (2018+). Preprint.

Examples

Run this code

# NOT RUN {
f.basis <- function(i) splines::bs(i, degree = 3)
# we specify dfmax to early stop the solution path to
# limit the execution time of the example
fit <- sail(x = sailsim$x, y = sailsim$y, e = sailsim$e,
            basis = f.basis, nlambda = 10, dfmax = 10,
            maxit = 100)

# estimated coefficients at each value of lambda
coef(fit)

# predicted response at each value of lambda
predict(fit)

#predicted response at a specific value of lambda
predict(fit, s = 0.5)
# plot solution path for main effects and interactions
plot(fit)
# plot solution path only for main effects
plot(fit, type = "main")
# plot solution path only for interactions
plot(fit, type = "interaction")

# }

Run the code above in your browser using DataLab