Learn R Programming

textTinyR (version 1.1.2)

select_predictors: Exclude highly correlated predictors

Description

Exclude highly correlated predictors

Usage

select_predictors(response_vector, predictors_matrix,
  response_lower_thresh = 0.1, predictors_upper_thresh = 0.75,
  threads = 1, verbose = FALSE)

Arguments

response_vector

a numeric vector (the length should be equal to the rows of the predictors_matrix parameter)

predictors_matrix

a numeric matrix (the rows should be equal to the length of the response_vector parameter)

response_lower_thresh

a numeric value. This parameter allows the user to keep all the predictors having a correlation with the response greater than the response_lower_thresh value.

predictors_upper_thresh

a numeric value. This parameter allows the user to keep all the predictors having a correlation comparing to the other predictors less than the predictors_upper_thresh value.

threads

a numeric value specifying the number of cores to run in parallel

verbose

either TRUE or FALSE. If TRUE then information will be printed out in the R session.

Value

a vector of column-indices

Details

The function works in the following way : The correlation of the predictors with the response is first calculated and the resulted correlations are sorted in decreasing order. Then iteratively predictors with correlation higher than the predictors_upper_thresh value are removed by favoring those predictors which are more correlated with the response variable. If the response_lower_thresh value is greater than 0.0 then only predictors having a correlation higher than or equal to the response_lower_thresh value will be kept, otherwise they will be excluded. This function returns the indices of the predictors and is useful in case of multicollinearity.

Examples

Run this code
# NOT RUN {
library(textTinyR)

set.seed(1)
resp = runif(100)

set.seed(2)
col = runif(100)

matr = matrix(c(col, col^4, col^6, col^8, col^10), nrow = 100, ncol = 5)

out = select_predictors(resp, matr, predictors_upper_thresh = 0.75)
# }

Run the code above in your browser using DataLab