Learn R Programming

HelpersMG (version 5.1)

IC_clean_data: Clean the dataframe before to be used with IC_threshold_matrix

Description

This function must be used if missing values are present in the dataset. It ensures that all correlations and partial correlations can be calculated. The columns of the dataframe are removed one per one until all can be calculated without error. It is possible to say that one or more columns must be retained because they are of particular importance in the analysis. The use and method parameters are used by cor() function. The function uses by default a parallel computing in Unix or MacOSX systems. If progress is TRUE and the package pbmcapply is present, a progress bar is displayed. If debug is TRUE, some informations are shown during the process. https://fr.wikipedia.org/wiki/Iconographie_des_corr<U+00E9>lations

Usage

IC_clean_data(
  data = stop("A dataframe object is required"),
  use = c("pairwise.complete.obs", "everything", "all.obs", "complete.obs",
    "na.or.complete"),
  method = c("pearson", "kendall", "spearman"),
  variable.retain = NULL,
  test.partial.correlation = TRUE,
  progress = TRUE,
  debug = FALSE
)

Arguments

data

The data.frame to be cleaned

use

an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs".

method

a character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated.

variable.retain

a vector with the name of columns to keep

test.partial.correlation

should the partial correlations be tested ?

progress

Show a progress bar

debug

if TRUE, information about progression of cleaning are shown

Value

A dataframe

Details

IC_clean_data checks and corrects the dataframe to be used with IC_threshold_matrix

References

Lesty, M., 1999. Une nouvelle approche dans le choix des r<U+00E9>gresseurs de la r<U+00E9>gression multiple en pr<U+00E9>sence d<U+2019>interactions et de colin<U+00E9>arit<U+00E9>s. Revue de Modulad 22, 41-77.

See Also

Other Iconography of correlations: IC_correlation_simplify(), IC_threshold_matrix(), plot.IconoCorel()

Examples

Run this code
# NOT RUN {
library("HelpersMG")
# based on https://fr.wikipedia.org/wiki/Iconographie_des_corr<U+00E9>lations
es <- structure(list(Student = c("e1", "e2", "e3", "e4", "e5", "e6", "e7", "e8"), 
                     Mass = c(52, 59, 55, 58, 66, 62, 63, 69), 
                     Age = c(12, 12.5, 13, 14.5, 15.5, 16, 17, 18), 
                     Assiduity = c(12, 9, 15, 5, 11, 15, 12, 9), 
                     Note = c(5, 5, 9, 5, 13.5, 18, 18, 18)), 
                     row.names = c(NA, -8L), class = "data.frame")
es

df_clean <- IC_clean_data(es, debug = TRUE)
cor_matrix <- IC_threshold_matrix(data=df_clean, threshold = NULL, progress=FALSE)
cor_threshold <- IC_threshold_matrix(data=df_clean, threshold = 0.3)
plot(cor_threshold, show.legend.strength=FALSE, show.legend.direction = FALSE)
cor_threshold_Note <- IC_correlation_simplify(matrix=cor_threshold, variable="Note")
plot(cor_threshold_Note, show.legend.strength=FALSE, show.legend.direction = FALSE)

cor_threshold <- IC_threshold_matrix(data=df_clean, threshold = 0.6)
plot(cor_threshold, 
layout=matrix(data=c(53, 53, 55, 55, 
                     55, 53, 55, 53), ncol=2, byrow=FALSE), 
show.legend.direction = FALSE,
show.legend.strength = FALSE, xlim=c(-2, 2), ylim=c(-2, 2))
# }

Run the code above in your browser using DataLab