Learn R Programming

lares (version 4.10.6)

corr: Correlation table

Description

This function correlates a whole dataframe, running one hot smart encoding (ohse) to transform non-numerical features. Note that it will automatically suppress columns with less than 3 non missing values and warn the user.

Usage

corr(
  df,
  method = "pearson",
  pvalue = FALSE,
  dec = 6,
  ignore = NA,
  dummy = TRUE,
  logs = FALSE,
  limit = 10,
  top = NA,
  ...
)

Arguments

df

Dataframe. It doesn't matter if it's got non-numerical columns: they will be filtered!

method

Character. Any of: c("pearson", "kendall", "spearman")

pvalue

Boolean. Returns a list, with correlations and statistical significance (p-value) for each value

dec

Integer. Number of decimals to round correlations and p-values

ignore

Vector or character. Which column should be ignored?

dummy

Boolean. Should One Hot (Smart) Encoding (ohse) be applied to categorical columns?

logs

Boolean. Calculate log(x)+1 for numerical columns?

limit

Integer. Limit one hot encoding to the n most frequent values of each column. Set to NA to ignore argument.

top

Integer. Select top N most relevant variables? Filtered and sorted by mean of each variable's correlations

...

Additional parameters to pass to ohse

Value

data.frame. Squared dimensions (nxn) to match every correlation between every df data.frame column/variable. Notice that when using ohse() you may get more dimensions.

See Also

Other Calculus: dist2d(), model_metrics(), quants()

Other Correlations: corr_cross(), corr_var()

Examples

Run this code
# NOT RUN {
data(dft) # Titanic dataset
df <- dft[,2:5]

corr(df)

# Ignore specific column
corr(df, ignore = "Pclass")

# Keep redundant combinations
corr(df, redundant = TRUE)

#' # Calculate p-values as well
corr(df, pvalue = TRUE)

# Test when no more than 2 non-missing values
df$trash <- c(1, rep(NA, nrow(df)-1))
# and another method...
corr(df, method = "spearman")
# }

Run the code above in your browser using DataLab