Learn R Programming

DataExplorer (version 0.8.3)

dummify: Dummify discrete features to binary columns

Description

Data dummification is also known as one hot encoding or feature binarization. It turns each category to a distinct column with binary (numeric) values.

Usage

dummify(data, maxcat = 50L, select = NULL)

Value

dummified dataset (discrete features only) preserving original features. However, column order might be different.

Arguments

data

input data

maxcat

maximum categories allowed for each discrete feature. Default is 50.

select

names of selected features to be dummified. Default is NULL.

Details

Continuous features will be ignored if added in select.

select features will be ignored if categories exceed maxcat.

Examples

Run this code
## Dummify iris dataset
str(dummify(iris))

## Dummify diamonds dataset ignoring features with more than 5 categories
data("diamonds", package = "ggplot2")
str(dummify(diamonds, maxcat = 5))
str(dummify(diamonds, select = c("cut", "color")))

Run the code above in your browser using DataLab