Learn R Programming

dataPreparation (version 1.1.1)

Automated Data Preparation

Description

Do most of the painful data preparation for a data science project with a minimum amount of code; Take advantages of 'data.table' efficiency and use some algorithmic trick in order to perform data preparation in a time and RAM efficient way.

Copy Link

Version

Install

install.packages('dataPreparation')

Monthly Downloads

1,023

Version

1.1.1

License

GPL-3 | file LICENSE

Maintainer

Emmanuel-Lin Toulemonde

Last Published

July 4th, 2023

Functions in dataPreparation (1.1.1)

compute_probability_ratio

Compute probability ratio
build_scales

Compute scales
aggregate_by_key

Automatic data_set aggregation by key
build_date_factor

Date Factor
as.POSIXct_fast

Faster date transformation
build_bins

Compute bins
build_encoding

Compute encoding
data_preparation_news

Show the NEWS file
build_target_encoding

Build target encoding
date_format_unifier

Unify dates format
compute_weight_of_evidence

Compute weight of evidence
fast_is_equal

Fast checks of equality
fast_round

Fast round
generate_factor_from_date

Generate factor from dates
fast_scale

scale
find_and_transform_dates

Identify date columns
find_and_transform_numerics

Identify numeric columns in a data_set set
description

Describe data set
fast_filter_variables

Filtering useless variables
fast_discretization

Discretization
identify_dates

Identify date columns
generate_date_diffs

Date difference
generate_from_character

Recode character
remove_percentile_outlier

Percentile outlier filtering
set_as_numeric_matrix

Numeric matrix preparation for Machine Learning.
fast_handle_na

Handle NA values
messy_adult

Adult with some ugly columns added
remove_sd_outlier

Standard deviation outlier filtering
remove_rare_categorical

Filter rare categories
set_col_as_character

Set columns as character
one_hot_encoder

One hot encoder
prepare_set

Preparation pipeline
same_shape

Give same shape
which_are_included

Identify columns that are included in others
which_are_constant

Identify constant columns
which_are_in_double

Identify double columns
un_factor

Unfactor factor with too many values
which_are_bijection

Identify bijections
set_col_as_date

Set columns as POSIXct
generate_from_factor

Recode factor
set_col_as_factor

Set columns as factor
get_most_frequent_element

Get most frequent element
set_col_as_numeric

Set columns as numeric
target_encode

Target encode
tiny_messy_adult

First 500 rows of messy_adult
shape_set

Final preparation before ML algorithm
adult

Adult for UCI repository