bess: Best subset selection

Description

Best subset selection for generalized linear model and Cox's proportional model.

Usage

bess(
  x,
  y,
  family = c("gaussian", "binomial", "poisson", "cox"),
  type = c("bss", "bsrr"),
  method = c("gsection", "sequential", "pgsection", "psequential"),
  tune = c("gic", "ebic", "bic", "aic", "cv"),
  s.list,
  lambda.list = 0,
  s.min,
  s.max,
  lambda.min = 0.001,
  lambda.max = 100,
  nlambda = 100,
  always.include = NULL,
  screening.num = NULL,
  normalize = NULL,
  weight = NULL,
  max.iter = 20,
  warm.start = TRUE,
  nfolds = 5,
  group.index = NULL,
  seed = NULL
)

Arguments

Input matrix, of dimension $n \times p$; each row is an observation vector and each column is a predictor/feature/variable.

The response variable, of n observations. For family = "binomial" should be a factor with two levels. For family="poisson", y should be a vector with positive integer. For family = "cox", y should be a two-column matrix with columns named time and status.

family

One of the following models: "gaussian", "binomial", "poisson", or "cox". Depending on the response. Any unambiguous substring can be given.

type

One of the two types of problems. type = "bss" for the best subset selection, and type = "bsrr" for the best subset ridge regression.

method

The method to be used to select the optimal model size and $L_2$ shrinkage. For method = "sequential", we solve the best subset selection and the best subset ridge regression problem for each s in 1,2,...,s.max and $\lambda$ in lambda.list. For method = "gsection", which is only valid for type = "bss", we solve the best subset selection problem with model size ranged between s.min and s.max, where the specific model size to be considered is determined by golden section. we solve the best subset selection problem with a range of non-continuous model sizes. For method = "pgsection" and "psequential", the Powell method is used to solve the best subset ridge regression problem. Any unambiguous substring can be given.

tune

The criterion for choosing the model size and $L_2$ shrinkage parameters. Available options are "gic", "ebic", "bic", "aic" and "cv". Default is "gic".

s.list

An increasing list of sequential values representing the model sizes. Only used for method = "sequential". Default is 1:min(p, round(n/log(n))).

lambda.list

A lambda sequence for "bsrr". Default is exp(seq(log(100), log(0.01), length.out = 100)).

s.min

The minimum value of model sizes. Only used for method = "gsection", "psequential" and "pgsection". Default is 1.

s.max

The maximum value of model sizes. Only used for method = "gsection", "psequential" and "pgsection". Default is min(p, round(n/log(n))).

lambda.min

The minimum value of lambda. Only used for method = "powell". Default is 0.001.

lambda.max

The maximum value of lambda. Only used for method = "powell". Default is 100.

nlambda

The number of $\lambda$s for the Powell path with sequential line search method. Only valid for method = "psequential".

always.include

An integer vector containing the indexes of variables that should always be included in the model.

screening.num

Users can pre-exclude some irrelevant variables according to maximum marginal likelihood estimators before fitting a model by passing an integer to screening.num and the sure independence screening will choose a set of variables of this size. Then the active set updates are restricted on this subset.

normalize

Options for normalization. normalize = 0 for no normalization. Setting normalize = 1 will only subtract the mean of columns of x. normalize = 2 for scaling the columns of x to have $\sqrt n$ norm. normalize = 3 for subtracting the means of the columns of x and y, and also normalizing the columns of x to have $\sqrt n$ norm. If normalize = NULL, by default, normalize will be set 1 for "gaussian", 2 for "binomial" and "poisson", 3 for "cox".

weight

Observation weights. Default is 1 for each observation.

max.iter

The maximum number of iterations in the bess function. In most of the case, only a few steps can guarantee the convergence. Default is 20.

warm.start

Whether to use the last solution as a warm start. Default is TRUE.

nfolds

The number of folds in cross-validation. Default is 5.

group.index

A vector of integers indicating the which group each variable is in. For variables in the same group, they should be located in adjacent columns of x and their corresponding index in group.index should be the same. Denote the first group as 1, the second 2, etc. If you do not fit a model with a group structure, please set group.index = NULL. Default is NULL.

seed

Seed to be used to devide the sample into K cross-validation folds. Default is NULL.

Value

A list with class attribute 'bess' and named components:

beta

The best fitting coefficients.

coef0

The best fitting intercept.

bestmodel

The best fitted model for type = "bss", the class of which is "lm", "glm" or "coxph".

loss

The training loss of the best fitting model.

The information criterion of the best fitting model when model selection is based on a certain information criterion.

cvm

The mean cross-validated error for the best fitting model when model selection is based on the cross-validation.

lambda

The lambda chosen for the best fitting model

beta.all

For bess objects obtained by gsection, pgsection and psequential, beta.all is a matrix with each column be the coefficients of the model in each iterative step in the tuning path. For bess objects obtained by sequential method, A list of the best fitting coefficients of size s=0,1,...,p and $\lambda$ in lambda.list with the smallest loss function. For "bess" objects of "bsrr" type, the fitting coefficients of the $i^{th} \lambda$ and the $j^{th}$ s are at the $i^{th}$ list component's $j^{th}$ column.

coef0.all

For bess objects obtained from gsection, pgsection and psequential, coef0.all contains the intercept for the model in each iterative step in the tuning path. For bess objects obtained from sequential path, coef0.all contains the best fitting intercepts of size $s=0,1,\dots,p$ and $\lambda$ in lambda.list with the smallest loss function.

loss.all

For bess objects obtained from gsection, pgsection and psequential, loss.all contains the training loss of the model in each iterative step in the tuning path. For bess objects obtained from sequential path, this is a list of the training loss of the best fitting intercepts of model size $s=0,1,\dots,p$ and $\lambda$ in lambda.list. For "bess" object obtained by "bsrr", the training loss of the $i^{th} \lambda$ and the $j^{th}$ s is at the $i^{th}$ list component's $j^{th}$ entry.

ic.all

For bess objects obtained from gsection, pgsection and psequential, ic.all contains the values of the chosen information criterion of the model in each iterative step in the tuning path. For bess objects obtained from sequential path, this is a matrix of the values of the chosen information criterion of model size $s=0,1,\dots,p$ and $\lambda$ in lambda.list with the smallest loss function. For "bess" object obtained by "bsrr", the training loss of the $i^{th} \lambda$ and the $j^{th}$ s is at the $i^{th}$ row $j^{th}$ column. Only available when model selection is based on a certain information criterion.

cvm.all

For bess objects obtained from gsection, pgsection and psequential, cvm.all contains the mean cross-validation error of the model in each iterative step in the tuning path. For bess objects obtained from sequential path, this is a matrix of the mean cross-validation error of model size $s=0,1,\dots,p$ and $\lambda$ in lambda.list with the smallest loss function. For "bess" object obtained by "bsrr", the training loss of the $i^{th} \lambda$ and the $j^{th}$ s is at the $i^{th}$ row $j^{th}$ column. Only available when model selection is based on the cross-validation.

lambda.all

The lambda chosen for each step in pgsection and psequential.

family

Type of the model.

s.list

The input s.list.

nsample

The sample size.

type

Either "bss" or "bsrr".

method

Method used for tuning parameters selection.

ic.type

The criterion of model selection.

Details

The best subset selection problem with model size $s$ is $$\min_\beta -2 \log L(\beta) \;\;{\rm s.t.}\;\; \|\beta\|_0 \leq s.$$ In the GLM case, $\log L(\beta)$ is the log-likelihood function; In the Cox model, $\log L(\beta)$ is the log partial likelihood function.

The best ridge regression problem with model size $s$ is $$\min_\beta -2 \log L(\beta) + \lambda\Vert\beta\Vert_2^2 \;\;{\rm s.t.}\;\; \|\beta\|_0 \leq s.$$ In the GLM case, $\log L(\beta)$ is the log likelihood function; In the Cox model, $\log L(\beta)$ is the log partial likelihood function.

For each candidate model size and $\lambda$, the best subset selection and the best subset ridge regression problems are solved by the primal-dual active set (PDAS) algorithm, see Wen et al (2020) for details. This algorithm utilizes an active set updating strategy via primal and dual variables and fits the sub-model by exploiting the fact that their support sets are non-overlap and complementary. For the case of method = "sequential" if warm.start = "TRUE", we run the PDAS algorithm for a list of sequential model sizes and use the estimate from the last iteration as a warm start. For the case of method = "gsection" of the best subset selection problem, a golden section search technique is adopted to determine the optimal model size efficiently. And for the case of method = "psequential" and method = "pgsection"of the best ridge regression problem, the Powell method using a sequential line search method or a golden section search technique is used for parameters determination.

References

Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.

Examples

Run this code

# NOT RUN {
#-------------------linear model----------------------#
# Generate simulated data
n <- 200
p <- 20
k <- 5
rho <- 0.4
seed <- 10
Tbeta <- rep(0, p)
Tbeta[1:k*floor(p/k):floor(p/k)] <- rep(1, k)
Data <- gen.data(n, p, k, rho, family = "gaussian", beta = Tbeta, seed = seed)
x <- Data$x[1:140, ]
y <- Data$y[1:140]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200]
lm.bss <- bess(x, y)
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
lm.bsrr <- bess(x, y, type = "bsrr", method = "pgsection")
coef(lm.bss)
coef(lm.bsrr)
print(lm.bss)
print(lm.bsrr)
summary(lm.bss)
summary(lm.bsrr)
pred.bss <- predict(lm.bss, newx = x_new)
pred.bsrr <- predict(lm.bsrr, newx = x_new)

# generate plots
plot(lm.bss, type = "both", breaks = TRUE)
plot(lm.bsrr)
#-------------------logistic model----------------------#
#Generate simulated data
Data <- gen.data(n, p, k, rho, family = "binomial", beta = Tbeta, seed = seed)

x <- Data$x[1:140, ]
y <- Data$y[1:140]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200]
logi.bss <- bess(x, y, family = "binomial")
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
logi.bsrr <- bess(x, y, type = "bsrr", family = "binomial", lambda.list = lambda.list)
coef(logi.bss)
coef(logi.bsrr)
print(logi.bss)
print(logi.bsrr)
summary(logi.bss)
summary(logi.bsrr)
pred.bss <- predict(logi.bss, newx = x_new)
pred.bsrr <- predict(logi.bsrr, newx = x_new)

# generate plots
plot(logi.bss, type = "both", breaks = TRUE)
plot(logi.bsrr)
#-------------------poisson model----------------------#
Data <- gen.data(n, p, k, rho=0.3, family = "poisson", beta = Tbeta, seed = seed)

x <- Data$x[1:140, ]
y <- Data$y[1:140]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200]
poi.bss <- bess(x, y, family = "poisson")
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
poi.bsrr <- bess(x, y, type = "bsrr",
                 family = "poisson", lambda.list = lambda.list)
coef(poi.bss)
coef(poi.bsrr)
print(poi.bss)
print(poi.bsrr)
summary(poi.bss)
summary(poi.bsrr)
pred.bss <- predict(poi.bss, newx = x_new)
pred.bsrr <- predict(poi.bsrr, newx = x_new)

# generate plots
plot(poi.bss, type = "both", breaks = TRUE)
plot(poi.bsrr)
#-------------------coxph model----------------------#
#Generate simulated data
Data <- gen.data(n, p, k, rho, family = "cox", scal = 10, beta = Tbeta)

x <- Data$x[1:140, ]
y <- Data$y[1:140, ]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200, ]
cox.bss <- bess(x, y, family = "cox")
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
cox.bsrr <- bess(x, y, type = "bsrr", family = "cox", lambda.list = lambda.list)
coef(cox.bss)
coef(cox.bsrr)
print(cox.bss)
print(cox.bsrr)
summary(cox.bss)
summary(cox.bsrr)
pred.bss <- predict(cox.bss, newx = x_new)
pred.bsrr <- predict(cox.bsrr, newx = x_new)

# generate plots
plot(cox.bss, type = "both", breaks = TRUE)
plot(cox.bsrr)

#----------------------High dimensional linear models--------------------#
# }
# NOT RUN {
data <- gen.data(n, p = 1000, k, family = "gaussian", seed = seed)

# Best subset selection with SIS screening
lm.high <- bess(data$x, data$y, screening.num = 100)
# }
# NOT RUN {
#-------------------group selection----------------------#
beta <- rep(c(rep(1,2),rep(0,3)), 4)
Data <- gen.data(200, 20, 5, rho=0.4, beta = beta, seed =10)
x <- Data$x
y <- Data$y

group.index <- c(rep(1, 2), rep(2, 3), rep(3, 2), rep(4, 3),
                rep(5, 2), rep(6, 3), rep(7, 2), rep(8, 3))
lm.group <- bess(x, y, s.min=1, s.max = 8, type = "bss", group.index = group.index)
lm.groupbsrr <- bess(x, y, type = "bsrr", s.min = 1, s.max = 8, group.index = group.index)
coef(lm.group)
coef(lm.groupbsrr)
print(lm.group)
print(lm.groupbsrr)
#'summary(lm.group)
summary(lm.groupbsrr)
pred.group <- predict(lm.group, newx = x_new)
pred.groupl0l2 <- predict(lm.groupbsrr, newx = x_new)
#-------------------include specified variables----------------------#
Data <- gen.data(n, p, k, rho, family = "gaussian", beta = Tbeta, seed = seed)
lm.bss <- bess(Data$x, Data$y, always.include = 2)

# }
# NOT RUN {
#-------------------trim32 data analysis in doi: 10.18637/jss.v094.i04----------------------#
# import trim32 data by:
load(url('https://github.com/Mamba413/bess/tree/master/data/trim32.RData'))
# or manually downloading trim32.RData in the github page:
# "https://github.com/Mamba413/bess/tree/master/data/" and read it by:
load('trim32.RData')

X <- trim32$x
Y <- trim32$y
dim(X)

# running bess with argument method = "sequential".
fit.seq <- bess(X, Y, method = "sequential")
summary(fit.seq)

# the bess function outputs an 'lm' type of object bestmodel associated
# with the selected best model
bm.seq <- fit.seq$bestmodel
summary(bm.seq)
pred.seq <- predict(fit.seq, newdata = data$x)
plot(fit.seq, type = "both", breaks = TRUE)

# We now call the function bess with argument method = "gsection"
fit.gs <- bess(X, Y, family = "gaussian", method = "gsection")
bm.gs <- fit.gs$bestmodel
summary(bm.gs)
beta <- coef(fit.gs, sparse = TRUE)
class(beta)
pred.gs <- predict(fit.gs, newdata = X)
# }

Run the code above in your browser using DataLab