Learn R Programming

rbvs (version 1.0.2)

rbvs: Ranking-Based Variable Selection

Description

Performs Rankings-Based Variable Selection using various measures of the dependence between the predictors and the response.

Usage

rbvs(x, y, ...)
"rbvs"(x, y, m, B = 500, measure = c("pc", "dc", "lasso", "mcplus", "user"), fun = NULL, s.est = s.est.quotient, iterative = TRUE, use.residuals = TRUE, k.max, min.max.freq = 0, max.iter = 10, verbose = TRUE, ...)

Arguments

x
Matrix with n observations of p covariates in each row.
y
Response vector with n observations.
...
Other parameters that may be passed to fun ands.est.
m
Subsample size used in the RBVS algorithm.
B
Number of sample splits.
measure
Character with the name of the method used to measure the association between the response and the covariates. See Details below.
fun
Function used to evaluate the measure given in measure. It is required when method=="user". Must have at least three arguments: x (covariates matrix), .y (response vector), subsamples (a matrix, each row contains indices of the observations to be used); return a vector of the same length as the number of covariates in .x. See for example pearson.cor or lasso.coef.
s.est
Function used to estimate the number of important covariates based on the RBVS path. Must accept probs (a vector with probabilities) as an argument. See s.est.quotient and Details below.
iterative
Logical variable indicating the type of the procedure. If TRUE, an iterative extension of the RBVS algorithm is launched.
use.residuals
Logical. If true, the impact of the previously detected variables is removed from the response in the IRBVS procedure.
k.max
Maximum size of the subset of important variables..
min.max.freq
Positive integer. Optional parameter - the algorithm stops searching for the most frequent set when the frequencies reach this value.
max.iter
Maximum number of iterations fot the IRBVS algorithm.
verbose
Logical indicating wheter the progress of the algorithm should be reported.

Value

Object of class rbvs with the following fields
measure
Character indicating type of measure used.
score
List with scores at each iteration.
subsets
A list with subset candidates at each iteration.
frequencies
A list with observed frequencies at each iteration.
ranks
Rankings evaluated (for the last iteration iterative=TRUE)
s.hat
Vector with the number of the covariates selected at each iteration.
active
Vector with the selected covariates.
timings
Vector reporting the amount of time the (I)RBVS algorithm took at each iteration.

Details

Currently supported measures are: Pearson correlation coefficient (measure="pc"), Distance Correlation (measure="dc"), the regression coefficients estimated via Lasso (measure="lasso"), the regression coefficients estimated via MC+ (measure="mcplus").

References

R. Baranowski, P. Fryzlewicz (2015), Ranking-Based Variable Selection, in submission (http://personal.lse.ac.uk/baranows/rbvs/rbvs.pdf)).

Examples

Run this code
set.seed(1)

x <- matrix(rnorm(200*1000),200,1000)
active <- 1:4
beta <- c(3,2.5,-1.7,-1)
y <- 1*rnorm(200) +x[,active]%*%beta
#RBVS algorithm
rbvs.object <- rbvs(x,y, iterative=FALSE)
rbvs.object$active
rbvs.object$subsets[[1]][[4]]
#IRBVS algorithm
rbvs.object <- rbvs(x,y)
rbvs.object$active

Run the code above in your browser using DataLab