Learn R Programming

⚠️There's a newer version (1.1.1) of this package.Take me there.

dataPreparation (version 0.4.3)

Automated Data Preparation

Description

Do most of the painful data preparation for a data science project with a minimum amount of code; Take advantages of data.table efficiency and use some algorithmic trick in order to perform data preparation in a time and RAM efficient way.

Copy Link

Version

Install

install.packages('dataPreparation')

Monthly Downloads

1,023

Version

0.4.3

License

GPL-3 | file LICENSE

Maintainer

Emmanuel-Lin Toulemonde

Last Published

February 12th, 2020

Functions in dataPreparation (0.4.3)

build_bins

Compute bins
build_scales

Compute scales
build_encoding

Compute encoding
as.POSIXct_fast

Faster date transformation
fastDiscretization

Discretization
dateFormatUnifier

Unify dates format
dataPrepNews

Show the NEWS file
build_target_encoding

Build target encoding
fastFilterVariables

Filtering useless variables
description

Describe data set
setColAsNumeric

Set columns as numeric
remove_percentile_outlier

Percentile outlier filtering
remove_rare_categorical

Filter rare categoricals
aggregateByKey

Automatic dataSet aggregation by key
adult

Adult for UCI repository
shapeSet

Final preparation before ML algorithm
findAndTransformDates

Identify date columns
generateDateDiffs

Date difference
generateFactorFromDate

Generate factor from dates
findAndTransformNumerics

Identify numeric columns in a dataSet set
remove_sd_outlier

Standard deviation outlier filtering
fastRound

Fast round
sameShape

Give same shape
fastScale

scale
identifyDates

Identify date columns
whichAreConstant

Identify constant columns
setAsNumericMatrix

Numeric matrix preparation for Machine Learning.
whichAreBijection

Identify bijections
messy_adult

Adult with some ugly columns added
setColAsCharacter

Set columns as character
setColAsDate

Set columns as POSIXct
one_hot_encoder

One hot encoder
prepareSet

Preparation pipeline
fastHandleNa

Handle NA values
fastIsEqual

Fast checks of equality
setColAsFactor

Set columns as factor
generateFromCharacter

Recode character
generateFromFactor

Recode factor
unFactor

Unfactor factor with too many values
target_encode

Target encode
whichAreInDouble

Identify double columns
whichAreIncluded

Identify columns that are included in others