Learn R Programming

textTinyR (version 1.1.8)

select_predictors: Exclude highly correlated predictors

Description

Exclude highly correlated predictors

Usage

select_predictors(
  response_vector,
  predictors_matrix,
  response_lower_thresh = 0.1,
  predictors_upper_thresh = 0.75,
  threads = 1,
  verbose = FALSE
)

Value

a vector of column-indices

Arguments

response_vector

a numeric vector (the length should be equal to the rows of the predictors_matrix parameter)

predictors_matrix

a numeric matrix (the rows should be equal to the length of the response_vector parameter)

response_lower_thresh

a numeric value. This parameter allows the user to keep all the predictors having a correlation with the response greater than the response_lower_thresh value.

predictors_upper_thresh

a numeric value. This parameter allows the user to keep all the predictors having a correlation comparing to the other predictors less than the predictors_upper_thresh value.

threads

a numeric value specifying the number of cores to run in parallel

verbose

either TRUE or FALSE. If TRUE then information will be printed out in the R session.

Details

The function works in the following way : The correlation of the predictors with the response is first calculated and the resulted correlations are sorted in decreasing order. Then iteratively predictors with correlation higher than the predictors_upper_thresh value are removed by favoring those predictors which are more correlated with the response variable. If the response_lower_thresh value is greater than 0.0 then only predictors having a correlation higher than or equal to the response_lower_thresh value will be kept, otherwise they will be excluded. This function returns the indices of the predictors and is useful in case of multicollinearity.

If during computation the correlation between the response variable and a potential predictor is equal to NA or +/- Inf, then a correlation of 0.0 will be assigned to this particular pair.

Examples

Run this code

library(textTinyR)

set.seed(1)
resp = runif(100)

set.seed(2)
col = runif(100)

matr = matrix(c(col, col^4, col^6, col^8, col^10), nrow = 100, ncol = 5)

out = select_predictors(resp, matr, predictors_upper_thresh = 0.75)

Run the code above in your browser using DataLab