Learn R Programming

analyzer (version 1.0.1)

association: Find association between variables

Description

association finds association among all the variables in the data.

Usage

association(
  tb,
  categorical = NULL,
  method1 = c("auto", "pearson", "kendall", "spearman"),
  method3 = c("auto", "parametric", "non-parametric"),
  methodMats = NULL,
  use = "everything",
  normality_test_method = c("ks", "anderson", "shapiro"),
  normality_test_pval = 0.05,
  ...
)

Arguments

tb

tabular data

categorical

a vector specifying the names of categorical (character, factor) columns

method1

method for association between continuous-continuous variables. values can be "auto", "pearson", "kendall", "spearman". See details for more information.

method3

method for association between continuous-categorical variables. Values can be "auto", "parametric", "non-parametric". See details of CQassociation for more information. Parametric does t-test while non-parametric does 'Mann-Whitney<U+2019> test.

methodMats

This parameter can be used to define the methods for calculating correlation and association at variables pair level. The input is a square data.frame of dimension - number of columns in tb. The row names and column names of methodMats are the column names of tb. The values in the data.frame can be:

between continuous-continuous variables

from parameter method1 - "auto", "pearson", "kendall", "spearman"

between continuous-categorical variables

from parameter method3 - "auto", "parametric", "non-parametric"

between categorical-categorical variables

can be anything

Default is NULL. In that case the method used for calculating correlation and association will be the inputs from parameters.

This parameter can also tale some other values. See example for more details. But its advisable to use like mentioned above.

use

an optional character string giving a method for computing association in the presence of missing values. This must be (complete or an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". If use is "everything", NAs will propagate conceptually, i.e., a resulting value will be NA whenever one of its contributing observations is NA. If use is "all.obs", then the presence of missing observations will produce an error. If use is "complete.obs" then missing values are handled by case wise deletion (and if there are no complete cases, that gives an error). "na.or.complete" is the same unless there are no complete cases, that gives NA

normality_test_method

method for normality test for a variable. Values can be shapiro for Shapiro-Wilk test or 'anderson' for 'Anderson-Darling' test of normality or ks for 'Kolmogorov-Smirnov'

normality_test_pval

significance level for normality tests. Default is 0.05

...

other parameters passed to cor, CCassociation, CQassociation and QQassociation

Value

A list of three tables:

continuous_corr

correlation among all the continuous variables

continuous_pvalue

Table containing p-value for the correlation test

categorical_cramers

Cramer's V value among all the categorical variables

categorical_pvalue

Chi Sq test p-value

continuous_categorical

association value among continuous and categorical variables

method_used

A data.frome showing the method used for all pairs of variables

Details

This function calculates association value in three categories -

  • between continuous variables (using CCassociation function)

  • between categorical variables (using QQassociation function)

  • between continuous and categorical variables (using CQassociation function)

For more details, look at the individual documentation of CCassociation, QQassociation, CQassociation

See Also

CCassociation for Correlation between Continuous variables, QQassociation for Association between Categorical variables, CQassociation for Association between Continuous-Categorical variables

Examples

Run this code
# NOT RUN {
tb <- mtcars
tb$cyl <- as.factor(tb$cyl)
tb$vs  <- as.factor(tb$vs)
out <- association(tb, categorical = c("cyl", "vs"))

# To use the methodMats parameter, create a matrix like this
methodMats <- out$method_used

# the values can be changed as per requirement
# NOTE: in addition to the values from parameters method1 and method3,
#       the values in methodMats can also be the values returned by
#       association function. But its advisable to use the options from
#       method1 and method3 arguements
methodMats["mpg", "disp"] <- methodMats["disp", "mpg"] <- "spearman"
out <- association(tb, categorical = c("cyl", "vs"), methodMats = methodMats)
rm(tb)

# }

Run the code above in your browser using DataLab