Learn R Programming

dataPreparation (version 0.4.3)

whichAreInDouble: Identify double columns

Description

Find all the columns that are in double.

Usage

whichAreInDouble(dataSet, keep_cols = NULL, verbose = TRUE)

Arguments

dataSet

Matrix, data.frame or data.table

keep_cols

List of columns not to drop (list of character, default to NULL)

verbose

Should the algorithm talk (logical, default to TRUE)

Value

A list of index of columns that have an exact duplicate in the dataSet set. Ex: if column i and column j (with j > i) are equal it will return j.

Details

This function is performing search by looking to every couple of columns. First it compares the first 10 lines of both columns. If they are not equal then the columns aren't identical, else it compares lines 11 to 100; then 101 to 1000... So this function is fast with dataSet set with a large number of lines and a lot of columns that aren't equals. If verbose is TRUE, the column logged will be the one returned.

Examples

Run this code
# NOT RUN {
# First let's build a matrix with 3 columns and a lot of lines, with 1's everywhere
M <- matrix(1, nrow = 1e6, ncol = 3)

# Now let's check which columns are equals
whichAreInDouble(M)
# It return 2 and 3: you should only keep column 1.

# Let's change the column 2, line 1 to 0. And check again
M[1, 2] <- 0
whichAreInDouble(M)
# It only returns 3

# What about NA? NA vs not NA => not equal
M[1, 2] <- NA
whichAreInDouble(M)
# It only returns 3

# What about NA?  Na vs NA => yep it's the same
M[1, 1] <- NA
whichAreInDouble(M)
# It only returns 2
# }

Run the code above in your browser using DataLab