Learn R Programming

dataPreparation (version 0.4.3)

whichAreBijection: Identify bijections

Description

Find all the columns that are bijections of another column.

Usage

whichAreBijection(dataSet, keep_cols = NULL, verbose = TRUE)

Arguments

dataSet

Matrix, data.frame or data.table

keep_cols

List of columns not to drop (list of character, default to NULL)

verbose

Should the algorithm talk (logical, default to TRUE)

Value

A list of index of columns that have an exact bijection in the dataSet set.

Details

Bijection, meaning that there is another column containing the exact same information (but maybe coded differently) for example col1: Men/Women, col2 M/W. This function is performing search by looking to every couple of columns. It computes numbers of unique elements in each column, and number of unique tuples of values. Computation is made by exponential search, so that the function is faster. If verbose is TRUE, the column logged will be the one returned. Ex: if column i and column j (with j > i) are bijections it will return j, expect if j is a character then it return i.

Examples

Run this code
# NOT RUN {
# First let's get a data set
data("adult")

# Now let's check which columns are equals
whichAreInDouble(adult)
# It doesn't give any result.

# Let's look of bijections
whichAreBijection(adult)
# Return education_num index because education_num and education which
# contain the same info
# }

Run the code above in your browser using DataLab