Learn R Programming

modellingTools (version 0.1.0)

get_top_corrs: Get the correlation of variables in a dataset with a given response, sorted highest to lowest

Description

This function computes the correlation of each input variable in a dataframe with a given response variable and returns a dataframe listing the variables sorted in order of most to least correlated. NAs are removed from correlation computations, and only numeric variables are considered.

Usage

get_top_corrs(dat, response_var, parallel = FALSE)

Arguments

dat
a tbl
response_var
character string containing the name of a variable in dat that you would like the correlations to be computed with, or an integer specifying the position of this variable
parallel
logical. If TRUE, parallel foreach is used for computing correlations (if FALSE, single threaded foreach is used; still highly efficient). Default is FALSE.

Value

a tbl with two columns: var_name gives the name of each variable and correlation gives its correlation with response_var.

Details

Use this technique for filtering out variables in the initial stages of data analysis, to get more familiar with how the individual input variables relate to the response variable of interest. Not recommended as a formal variable selection technique, since it will ignore interactions between inputs.

See Also

Other descriptive: proc_freq

Examples

Run this code
x <- iris
get_top_corrs(x,"Petal.Length")

Run the code above in your browser using DataLab