Function to fit the Sparse Additive Interaction Model with strong heredity for a sequence of tuning parameters. This is a penalized regression method that ensures the interaction term is non-zero only if its corresponding main-effects are non-zero. This model only considers the interactions between a single exposure (E) variable and a high-dimensional matrix (X). Additive (non-linear) main effects and interactions can be specified by the user. This can also be seen as a varying-coefficient model.
sail(x, y, e, basis = function(i) splines::bs(i, df = 5),
strong = TRUE, group.penalty = c("gglasso", "grMCP", "grSCAD"),
family = c("gaussian", "binomial"), center.x = TRUE,
center.e = TRUE, expand = TRUE, group, weights,
penalty.factor = rep(1, 1 + 2 * nvars), lambda.factor = ifelse(nobs <
(1 + 2 * bscols * nvars), 0.01, 1e-04), lambda = NULL, alpha = 0.5,
nlambda = 100, thresh = 1e-04, fdev = 1e-05, maxit = 1000,
dfmax = 2 * nvars + 1, verbose = 0)
input matrix of dimension n x p
, where n
is the number
of subjects and p is number of X variables. Each row is an observation
vector. Can be a high-dimensional (n < p) matrix. Can be a user defined
design matrix of main effects only (without intercept) if
expand=FALSE
response variable. For family="gaussian"
should be a 1 column
matrix or numeric vector. For family="binomial"
, should be a 1
column matrix or numeric vector with -1 for failure and 1 for success.
exposure or environment vector. Must be a numeric vector. Factors must be converted to numeric.
user defined basis expansion function. This function will be
applied to every column in x
. Specify function(i) i
if no
expansion is desired. Default: function(i) splines::bs(i, df = 5)
.
Flag specifying strong hierarchy (TRUE) or weak hierarchy (FALSE). Default FALSE.
group lasso penalty. Can be one of "gglasso"
(group lasso), "grMCP"
(group MCP) or "grSCAD"
(group SCAD).
See references for details. Default: "gglasso"
.
response type. See y
for details. Currently only
family = "gaussian"
is implemented. Default: "gaussian"
.
should the columns of x
(after basis expansion) be
centered (logical). Default: TRUE
.
should exposure variable e
be centered. Default:
TRUE
.
should basis
be applied to every column of x
(logical). Set to FALSE
if you want a user defined main effects
design matrix. If FALSE
the group
membership argument must
also be supplied. Default: TRUE
.
a vector of consecutive integers, starting from 1, describing
the grouping of the coefficients. Only required when expand=FALSE
.
observation weights. Default is 1 for each observation. Currently NOT IMPLEMENTED.
separate penalty factors can be applied to each
coefficient. This is a number that multiplies lambda to allow differential
shrinkage. Can be 0 for some variables, which implies no shrinkage, and
that variable is always included in the model. Default is 1 for all
variables. Must be of length 1 + 2*ncol(x)
where the first entry
corresponds to the penalty.factor for e
, the next ncol(x)
entries correspond to main effects, and the following ncol(x)
entries correspond to the interactions.
the factor for getting the minimal lambda in the lambda
sequence, where min(lambda) = lambda.factor * max(lambda)
.
max(lambda)
is the smallest value of lambda for which all
coefficients are zero. The default depends on the relationship between
N
(the number of rows in the matrix of predictors) and q
(the
total number of predictors in the design matrix - including interactions).
If N > q
, the default is 1e-4
, close to zero. If N <
p
, the default is 0.01
. A very small value of lambda.factor will
lead to a saturated fit.
a user supplied lambda sequence. Typically, by leaving this
option unspecified users can have the program compute its own lambda
sequence based on nlambda
and lambda.factor
. Supplying a
value of lambda overrides this. It is better to supply a decreasing
sequence of lambda values than a single (small) value, if not, the program
will sort user-defined lambda sequence in decreasing order automatically.
Default: NULL
.
the mixing tuning parameter, with \(0<\alpha<1\). It controls
the penalization strength between the main effects and the interactions.
The penalty is defined as $$\lambda(1-\alpha)(w_e|\beta_e|+ \sum w_j
||\beta_j||_2) + \lambda\alpha(\sum w_{je} |\gamma_j|)$$Larger values of
alpha
will favor selection of main effects over interactions.
Smaller values of alpha
will allow more interactions to enter the
final model. Default: 0.5
the number of lambda values. Default: 100
convergence threshold for coordinate descent. Each
coordinate-descent loop continues until the change in the objective
function after all coefficient updates is less than thresh
. Default:
1e-04
.
minimum fractional change in deviance for stopping path. Default:
1e-5
.
maximum number of outer-loop iterations allowed at fixed lambda
value. If models do not converge, consider increasing maxit
.
Default: 1000.
limit the maximum number of variables in the model. Useful for
very large q
(the total number of predictors in the design matrix -
including interactions), if a partial path is desired. Default: 2 * p
+ 1
where p is the number of columns in x
.
display progress. Can be either 0,1 or 2. 0 will not display any progress, 2 will display very detailed progress and 1 is somewhere in between. Default: 1.
an object with S3 class "sail", "*"
, where "*"
is
"lspath" or "logitreg". Results are provided for converged values of lambda
only.
intercept sequence of length nlambda
a (#
main effects after basis expansion x nlambda
) matrix of main effects
coefficients, stored in sparse column format ("dgCMatrix")
a (# interaction effects after basis expansion x
nlambda
) matrix of interaction effects coefficients, stored in
sparse column format ("dgCMatrix")
A p x
nlambda
matrix of (\(\gamma\)) coefficients, stored in sparse column
format ("dgCMatrix")
exposure effect estimates of length
nlambda
list of length nlambda
containing
character vector of selected variables
the actual sequence of lambda values used
value for the mixing tuning parameter \(0<\alpha<1\)
the number of nonzero main effect coefficients for each value of lambda
the number of nonzero interaction coefficients for each value of lambda
the
number of nonzero exposure (e
) coefficients for each value of
lambda
the fraction of (null) deviance explained (for "lspath", this is the R-square). The deviance calculations incorporate weights if present in the model. The deviance is defined to be 2*(loglike_sat - loglike), where loglike_sat is the log-likelihood for the saturated model (a model with a free parameter per observation). Hence dev.ratio=1-dev/nulldev.
vector of logicals of length
nlambda
indicating if the algorithm converged
number of converged lambdas
design matrix (X, E, X:E), of dimension
n x (2*ncols*p+1)
if expand=TRUE
. This is used in the
predict
method.
number of observations
number of main effect variables
character of variable names for main effects (without expansion)
an
integer of basis for each column of x if expand=TRUE
, or an integer
vector of basis for each variable if expand=FALSE
were the columns of x (after expansion) centered?
was e
centered?
user defined basis expansion function
was the basis function applied to each column of x?
a vector of consecutive integers describing the grouping of the coefficients. Only if expand=FALSE
character vector of names of interaction variables
character vector of names of main effect variables (with expansion)
The objective function for family="gaussian"
is $$RSS/2n
+ \lambda(1-\alpha)(w_e|\beta_e|+ \sum w_j ||\beta_j||_2) +
\lambda\alpha(\sum w_{je} |\gamma_j|)$$ where RSS
is the residual sum
of squares and n
is the number of observations. See Bhatnagar et al.
(2018+) for details.
It is highly recommended to specify center.x = TRUE
and
center.e = TRUE
for both convergence and interpretation reasons. If
centered, the final estimates can be interpreted as the effect of the
predictor on the response while holding all other predictors at their mean
value. For computing speed reasons, if models are not converging or running
slow, consider increasing thresh
, decreasing nlambda
, or
increasing lambda.factor
before increasing maxit
. Then try
increasing the value of alpha
(which translates to more penalization
on the interactions).
By default, sail
uses the group lasso penalty on the basis
expansions of x
. To use the group MCP and group SCAD penalties (see
Breheny and Huang 2015), the grpreg
package must be installed.
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. http://www.jstatsoft.org/v33/i01/.
Breheny P and Huang J (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25: 173-187.
Yang Y, Zou H (2015). A fast unified algorithm for solving group-lasso penalize learning problems. Statistics and Computing. Nov 1;25(6):1129-41. http://www.math.mcgill.ca/yyang/resources/papers/STCO_gglasso.pdf
Bhatnagar SR, Yang Y, Greenwood CMT. Sparse additive interaction models with the strong heredity property (2018+). Preprint.
# NOT RUN {
f.basis <- function(i) splines::bs(i, degree = 3)
# we specify dfmax to early stop the solution path to
# limit the execution time of the example
fit <- sail(x = sailsim$x, y = sailsim$y, e = sailsim$e,
basis = f.basis, nlambda = 10, dfmax = 10,
maxit = 100)
# estimated coefficients at each value of lambda
coef(fit)
# predicted response at each value of lambda
predict(fit)
#predicted response at a specific value of lambda
predict(fit, s = 0.5)
# plot solution path for main effects and interactions
plot(fit)
# plot solution path only for main effects
plot(fit, type = "main")
# plot solution path only for interactions
plot(fit, type = "interaction")
# }
Run the code above in your browser using DataLab