Learn R Programming

rcompanion (version 2.5.0)

correlation: Correlation and measures of association

Description

Produces measures of association for all variables in a data frame with confidence intervals when available.

Usage

correlation(
  data = NULL,
  printClasses = FALSE,
  progress = TRUE,
  methodNum = "pearson",
  methodOrd = "kendall",
  methodNumOrd = "spearman",
  methodNumNom = "eta",
  methodNumBin = "pearson",
  testChisq = "chisq",
  ci = FALSE,
  conf = 0.95,
  R = 1000,
  correct = FALSE,
  reportIncomplete = TRUE,
  na.action = "na.omit",
  digits = 3,
  pDigits = 4,
  ...
)

Value

A data frame of variables, association statistics, p-values, and confidence intervals.

Arguments

data

A data frame.

printClasses

If TRUE, prints a table of classes for all variables.

progress

If TRUE, prints progress bar when bootstrap methods are called.

methodNum

The method for the correlation for two numeric variables. The default is "pearson". Other options are "spearman" and "kendall".

methodOrd

The method for the correlation for two ordinal variables. The default is "kendall", with Kendall's tau-c used. Other option is "spearman".

methodNumOrd

The method for the correlation of a numeric and an ordinal variable. The default is "pearson". Other options are "spearman" and "kendall".

methodNumNom

The method for the correlation of a numeric and a nominal variable.

The default is "eta", which is the square root of the r-squared value from anova. The other option is "epsilon", which is the same, except with the numeric value rank-transformed.

methodNumBin

The method for the correlation of a numeric and a binary variable. The default is "pearson". The other option is "glass", which uses the Glass rank biserial correlation.

testChisq

The method for the test of two nominal variables. The default is "chisq". The other option is "fisher".

ci

If TRUE, calculates confidence intervals for methods requiring bootstrap. If FALSE, will return only those confidence intervals from methods not requiring bootstrap.

conf

The confidence level for confidence intervals.

R

The number of replications to use for bootstrap confidence intervals for applicable methods.

correct

Passed to chisq.test.

reportIncomplete

If FALSE, NA will be reported in cases where there are instances of the calculation of the statistic failing during the bootstrap procedure.

na.action

If "na.omit", the function will use only complete cases, assessed on a bivariate basis. The other option is "na.pass".

digits

The number of decimal places in the output of most statistics.

pDigits

The number of decimal places in the output for p-values.

...

Other arguments.

Author

Salvatore Mangiafico, mangiafico@njaes.rutgers.edu

Details

It’s important that variables are assigned the correct class to get an appropriate measure of association. That is, factor variables should be of class "factor", not "character". Ordered factors should be ordered factors (and have their levels in the correct order!).

Date variables are treated as numeric.

The default for measures of association tend to be "parametric" type. That is, e.g. Pearson correlation where appropriate.

Nonparametric measures of association will be reported with the options methodNum = "spearman", methodNumNom = "epsilon", methodNumBin = "glass".

References

https://rcompanion.org/handbook/I_14.html

See Also

phi, spearmanRho, cramerV, freemanTheta, wilcoxonRG

Examples

Run this code

Length   = c(0.29, 0.25, NA, 0.40, 0.50, 0.57, 0.62, 0.88, 0.99, 0.90)
Rating   = factor(ordered=TRUE, levels=c("Low", "Medium", "High"),
                  x = rep(c("Low", "Medium", "High"), c(3,3,4)))
Color    = factor(rep(c("Red", "Green", "Blue"), c(4,4,2)))
Flag     = factor(rep(c(TRUE, FALSE, TRUE), c(5,4,1)))
Answer   = factor(rep(c("Yes", "No", "Yes"), c(4,3,3)), levels=c("Yes", "No"))
Location = factor(rep(c("Home", "Away", "Other"), c(2,4,4)))
Distance = factor(ordered=TRUE, levels=c("Low", "Medium", "High"),
                  x = rep(c("Low", "Medium", "High"), c(5,2,3))) 
Start    = seq(as.Date("2024-01-01"), by = "month", length.out = 10)
Data = data.frame(Length, Rating, Color, Flag, Answer, Location, Distance, Start)  
correlation(Data)



Run the code above in your browser using DataLab