Learn R Programming

SSVS

The goal of {SSVS} is to provide functions for performing stochastic search variable selection (SSVS) for binary and continuous outcomes and visualizing the results. SSVS is a Bayesian variable selection method used to estimate the probability that individual predictors should be included in a regression model. Using MCMC estimation, the method samples thousands of regression models in order to characterize the model uncertainty regarding both the predictor set and the regression parameters.

Installation

You can install the development version of {SSVS} from GitHub with:

# install.packages("remotes")
remotes::install_github("sabainter/SSVS")

Example 1 - continuous response variable

Consider a simple example using SSVS on the mtcars dataset to predict quarter mile times. We first specify our response variable (“qsec”), then choose our predictors and run the ssvs() function.

library(SSVS)
outcome <- 'qsec'
predictors <- c('cyl', 'disp', 'hp', 'drat', 'wt',
 'vs', 'am', 'gear', 'carb','mpg')

results <- ssvs(data = mtcars, x = predictors, y = outcome, progress = FALSE)

The results can be summarized and printed using the summary() function. This will display the MIP for each predictor, the average coefficients including and excluding zeros, and credible intervals for each coefficient.

summary_results <- summary(results, interval = 0.9, ordered = TRUE)
VariableMIPAvg BetaAvg Nonzero BetaLower CI (90%)Upper CI (90%)
wt0.84331.04331.23720.00001.9513
vs0.75120.63990.85190.00001.1982
hp0.5413-0.4995-0.9228-1.33490.0000
cyl0.4551-0.5173-1.1367-1.76700.0005
am0.4240-0.3107-0.7328-1.08050.0000
disp0.4130-0.4553-1.1023-1.81700.0012
carb0.3938-0.2890-0.7338-1.00680.0000
gear0.2013-0.0918-0.4560-0.54640.0002
mpg0.15840.05630.3557-0.00010.4160
drat0.1003-0.0180-0.1794-0.00080.0000

The MIPs for each predictor can then be visualized using the plot() function.

plot(results)

Example 2 - binary response variable

In the example above, the response variable was a continuous variable. The same workflow can be used for binary variables by specifying continuous = FALSE to the ssvs() function.

As an example, let’s create a binary variable:

library(AER)
#> Warning: package 'AER' was built under R version 4.3.3
data(Affairs)
Affairs$hadaffair[Affairs$affairs > 0] <- 1
Affairs$hadaffair[Affairs$affairs == 0] <- 0

Then define the outcome and predictors.

outcome <- "hadaffair"
predictors <- c("gender", "age", "yearsmarried", "children", "religiousness", "education", "occupation", "rating")

And finally run the model:

results <- ssvs(data = Affairs, x = predictors, y = outcome, continuous = FALSE, progress = FALSE)

Now the results can be summarized or visualized in the same manner.

summary_results <- summary(results, interval = 0.9, ordered = TRUE)
VariableMIPAvg BetaAvg Nonzero BetaLower CI (90%)Upper CI (90%)
rating1.0000-0.5552-0.5552-0.7106-0.3917
religiousness0.4247-0.1422-0.3348-0.40700.0000
yearsmarried0.10350.03210.30990.00000.1024
children0.07510.02040.27140.00000.0000
age0.0111-0.0024-0.21460.00000.0000
gender0.00930.00100.10670.00000.0000
occupation0.00640.00080.11760.00000.0000
education0.00500.00050.10660.00000.0000
plot(results)

Example 3 - SSVS with multiple imputation (MI)

First, we will use the mice() function from the {mice} package to perform multiple imputation.

library(mice)
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind

# Load the mtcars dataset
data <- mtcars

# Introduce random missingness in 10% of the data
set.seed(123)  
n <- nrow(data) * ncol(data)
missing_indices <- sample(n, size = 0.1 * n, replace = FALSE)

# Convert missing indices to row-column positions
rows <- (missing_indices - 1) %% nrow(data) + 1
cols <- (missing_indices - 1) %/% nrow(data) + 1

# Assign NA to the identified positions
for (i in seq_along(rows)) {
  data[rows[i], cols[i]] <- NA
}

# Perform multiple imputation using mice
imputed_data <- mice(data, m = 5, maxit = 50, seed = 123)

# Display the results of the imputation
summary(imputed_data)

# Extract and show the first completed dataset
imputed_mtcars <- complete(imputed_data, "long")
head(imputed_mtcars)

We will use this multiply imputed data set for SSVS, using the ssvs_mi() function.

outcome <- 'qsec'
predictors <- c('cyl', 'disp', 'hp', 'drat', 'wt', 'vs', 'am', 'gear', 'carb','mpg')
imputation <- '.imp'
results <- ssvs_mi(data = imputed_mtcars, y = outcome, x = predictors, imp = imputation)

The results of SSVS with MI can be summarized with the summary() and plot() functions. This will summarize across imputations for each predictor: the average MIP and the mean, minimum, maximum, and average nonzero beta coefficients.

Interactive version

You can launch an interactive (shiny) web application that lets you run SSVS analyses without programming. Simply install this package and run SSVS::launch() in an R console.

Copy Link

Version

Install

install.packages('SSVS')

Monthly Downloads

293

Version

2.1.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Sierra Bainter

Last Published

March 19th, 2025

Functions in SSVS (2.1.0)

plot.ssvs_mi

Plot SSVS-MI Estimates and Marginal Inclusion Probabilities (MIP)
ssvs_mi

Perform SSVS on Multiply Imputed Datasets
summary.ssvs

Summarize results of an SSVS model
summary.ssvs_mi

Calculate Summary Statistics for SSVS-MI Results
launch

Run an interactive analysis tool (Shiny app) that lets you perform SSVS in a browser
print.ssvs_mi_summary

Print the summary of ssvs_mi
plot.ssvs

Plot results of an SSVS model
ssvs

Perform SSVS for continuous and binary outcomes
print.ssvs_summary

Print the summary of an SSVS model
dat

Example dataset for ssvs function @format A data frame with 74 records and 76 variables
%>%

Pipe operator
imputed_mtcars

Imputed mtcars Dataset