Learn R Programming

⚠️There's a newer version (2.0.4) of this package.Take me there.

BeSS: An R Package for Best Subset Selection and Best Subset Ridge Regression

Introduction

The advance in modern technology, including computing power and storage, brings about more and more high-dimensional data in which the number of features can be much larger than the number of observations (Hastie et al. 2009). Examples include gene, microarray, and proteomics data, high-resolution images, high-frequency financial data, e-commerce data, warehouse data, resonance imaging, signal processing, among many others (Fan et al. 2011).

Since it is not easy to explain the relationship between the response and the variables if the model is too complicated, associated with a lot of predictors for example, and reducing the number of variables resorting to subjective approaches can be influenced by one's interests and hypotheses. There are at least three challenges for regression methods under the high dimensional setting:

  • How to find

models with good prediction performance?

  • How to discover the

true “sparsity pattern”?

  • How to find models combining the above-mentioned two abilities?

The best subset selection is up to these challenges, which enjoy the following admirable advantages:

  • It obtains an unbiased estimator as long as the true active set is discovered.

  • It ranks highest in terms of model interpretation.

  • It provides an objective way to reduce the number of variables.

  • By excluding irrelative variables, the best subset selection earns improved out-of-sample accuracy and avoids overfitting in some sence.

By introducing a shrinkage on the coefficients the best subset ridge regression provides a more sophisticated trade-off between model parsimony and prediction on the based of the best subset selection.

Softwares

R package

To download and install BeSS from CRAN:

install.packages("BeSS")

Or try the development version on GitHub:

# install.packages("devtools")
devtools::install_github("Mamba413/bess/R")

Following are comparisons with some R packages aiming at best subset selection in several metrics:

leapslmSubsetbestglmglmutiBeSS
Solve linear regression models:heavy_check_mark::heavy_check_mark::heavy_check_mark::heavy_check_mark::heavy_check_mark:
Solve logistic regression models:x::x::heavy_check_mark::heavy_check_mark::heavy_check_mark:
Solve poisson regression models:x::x::heavy_check_mark::heavy_check_mark::heavy_check_mark:
Solve CoxPH regression models:x::x::x::heavy_check_mark::heavy_check_mark:
group variable selection:x::x::x::x::heavy_check_mark:
Feature screening:x::x::x::x::heavy_check_mark:
Tuning parameter determination on information criterion:x::heavy_check_mark::heavy_check_mark::heavy_check_mark::heavy_check_mark:
Tuning parameter determination on cross-validation:x::x::heavy_check_mark::x::heavy_check_mark:
Include specified variables:x::heavy_check_mark::x::x::heavy_check_mark:
Options for coefficient shrinkage:x::x::x::x::heavy_check_mark:
Computational efficiency:walking::walking::walking::running::walking::walking:(impossible for glm with variable number greater than 15):walking::running: (impossible for glm with variable number greater than 32):running::running:

See the following documents for more details about the BeSS package:

  • vignette can be opened with vignette("BeSS") in R (moderate)

  • JSS paper (detailed)

References

  • Wen, C., Zhang, A., Quan, S., & Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models. Journal of Statistical Software, 94(4), 1 - 24. doi:http://dx.doi.org/10.18637/jss.v094.i04

Copy Link

Version

Install

install.packages('BeSS')

Monthly Downloads

634

Version

2.0.2

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Canhong Wen

Last Published

January 23rd, 2021

Functions in BeSS (2.0.2)

deviance.bess

Extract the deviance from a "bess.one" object.
bess.one

Best subset selection/Best subset ridge regression with a specified model size and a shrinkage parameter
plot.bess

Produces a coefficient profile plot of the coefficient or loss function paths
gen.data

Generate simulated data
predict.bess

make predictions from a "bess" object.
BeSS-package

BeSS: Best Subset Selection /Ridge Regression in Linear, Logistic, Poisson and CoxPH Models
logLik.bess

Extract the log-likelihood from a "bess.one" object.
duke

Duke breast cancer data
coef.bess

Provides estimated coefficients from a fitted "bess" object.
print.bess

print method for a "bess" object
summary.bess

summary method for a "bess.one" object
trim32

The Bardet-Biedl syndrome Gene expression data
bess

Best subset selection