Learn R Programming

expss (version 0.11.6)

as.dichotomy: Convert variable (possibly multiple choice question) to data.frame/matrix of dummy variables.

Description

This function converts variable/multiple response variable (vector/matrix/data.frame) with category encoding into data.frame/matrix with dichotomy encoding (0/1) suited for most statistical analysis, e. g. clustering, factor analysis, linear regression and so on.

  • as.dichotomy returns data.frame of class 'dichotomy' with 0, 1 and possibly NA.

  • dummy returns matrix of class 'dichotomy' with 0, 1 and possibly NA.

  • dummy1 drops last column in dichotomy matrix. It is useful in many cases because any column of such matrix usually is linear combinations of other columns.

Usage

as.dichotomy(
  x,
  prefix = "v",
  keep_unused = FALSE,
  use_na = TRUE,
  keep_values = NULL,
  keep_labels = NULL,
  drop_values = NULL,
  drop_labels = NULL,
  presence = 1,
  absence = 0
)

dummy( x, keep_unused = FALSE, use_na = TRUE, keep_values = NULL, keep_labels = NULL, drop_values = NULL, drop_labels = NULL, presence = 1, absence = 0 )

dummy1( x, keep_unused = FALSE, use_na = TRUE, keep_values = NULL, keep_labels = NULL, drop_values = NULL, drop_labels = NULL, presence = 1, absence = 0 )

is.dichotomy(x)

Value

as.dichotomy returns data.frame of class dichotomy

with 0,1. Columns of this data.frame have variable labels according to value labels of original data. If label doesn't exist for particular value then this value will be used as variable label. dummy returns matrix of class dichotomy. Column names of this matrix are value labels of original data.

Arguments

x

vector/factor/matrix/data.frame.

prefix

character. By default "v".

keep_unused

Logical. Should we create columns for unused value labels/factor levels? FALSE by default.

use_na

Logical. Should we use NA for rows with all NA or use 0's instead. TRUE by default.

keep_values

Numeric/character. Values that should be kept. By default all values will be kept.

keep_labels

Numeric/character. Labels/levels that should be kept. By default all labels/levels will be kept.

drop_values

Numeric/character. Values that should be dropped. By default all values will be kept. Ignored if keep_values/keep_labels are provided.

drop_labels

Numeric/character. Labels/levels that should be dropped. By default all labels/levels will be kept. Ignored if keep_values/keep_labels are provided.

presence

numeric value which will code presence of the level. By default it is 1. Note that all tables functions need that presence and absence will be 1 and 0.

absence

numeric value which will code absence of the level. By default it is 0. Note that all tables functions need that presence and absence will be 1 and 0.

See Also

as.category for reverse conversion, mrset, mdset for usage multiple-response variables with tables.

Examples

Run this code
data.table::setDTthreads(2)
# toy example
# brands - multiple response question
# Which brands do you use during last three months? 
set.seed(123)
brands = as.sheet(t(replicate(20,sample(c(1:5,NA),4,replace = FALSE))))
# score - evaluation of tested product
score = sample(-1:1,20,replace = TRUE)
var_lab(brands) = "Used brands"
val_lab(brands) = autonum("
                              Brand A
                              Brand B
                              Brand C
                              Brand D
                              Brand E
                              ")

var_lab(score) = "Evaluation of tested brand"
val_lab(score) = make_labels("
                             -1 Dislike it
                              0 So-so
                              1 Like it    
                             ")

cro_cpct(as.dichotomy(brands), score)
# the same as
cro_cpct(mrset(brands), score)

# customer segmentation by used brands
kmeans(dummy(brands), 3)

# model of influence of used brands on evaluation of tested product 
summary(lm(score ~ dummy(brands)))

# prefixed data.frame 
as.dichotomy(brands, prefix = "brand_")

Run the code above in your browser using DataLab