Learn R Programming

caret (version 5.07-001)

dummyVars: Create A Full Set of Dummy Variables

Description

dummyVars creates a full set of dummy variables (i.e. less than full rank parameterization)

Usage

dummyVars(formula, ...)

## S3 method for class 'default': dummyVars(formula, data, sep = ".", levelsOnly = FALSE, ...)

## S3 method for class 'dummyVars': predict(object, newdata, na.action = na.pass, ...)

contr.dummy(n, ...)

Arguments

Value

  • The output of dummyVars is a list of class 'dummyVars' with elements
  • callthe function call
  • formthe model formula
  • varsnames of all the variables in the model
  • facVarsnames of all the factor variables in the model
  • lvlslevels of any factor variables
  • sepNULL or a character separator
  • termsthe terms.formula object
  • levelsOnlya logical
  • The predict function produces a data frame.

    contr.dummy generates a matrix with n rows and n columns.

Details

Most of the contrasts functions in R produce full rank parameterizations of the predictor data. For example, contr.treatment creates a reference cell in the data and defines dummy variables for all factor levels except those in the reference cell. For example, if a factor with 5 levels is used in a model formula alone, contr.treatment creates columns for the intercept and all the factor levels except the first level of the factor. For the data in the Example section below, this would produce: (Intercept) dayTue dayWed dayThu dayFri daySat daySun 1 1 1 0 0 0 0 0 2 1 1 0 0 0 0 0 3 1 1 0 0 0 0 0 4 1 0 0 1 0 0 0 5 1 0 0 1 0 0 0 6 1 0 0 0 0 0 0 7 1 0 1 0 0 0 0 8 1 0 1 0 0 0 0 9 1 0 0 0 0 0 0

In some situations, there may be a need for dummy variables for all of the levels of the factor. For the same example: dayMon dayTue dayWed dayThu dayFri daySat daySun 1 0 1 0 0 0 0 0 2 0 1 0 0 0 0 0 3 0 1 0 0 0 0 0 4 0 0 0 1 0 0 0 5 0 0 0 1 0 0 0 6 1 0 0 0 0 0 0 7 0 0 1 0 0 0 0 8 0 0 1 0 0 0 0 9 1 0 0 0 0 0 0

Given a formula and initial data set, the class dummyVars gathers all the information needed to produce a full set of dummy variables for any data set. It uses contr.dummy as the base function to do this.

References

http://cran.r-project.org/doc/manuals/R-intro.html#Formulae-for-statistical-models

See Also

model.matrix, contrasts, formula

Examples

Run this code
when <- data.frame(time = c("afternoon", "night", "afternoon",
                            "morning", "morning", "morning",
                            "morning", "afternoon", "afternoon"),
                   day = c("Mon", "Mon", "Mon",
                           "Wed", "Wed", "Fri",
                           "Sat", "Sat", "Fri"))

levels(when$time) <- c("morning", "afternoon", "night")
levels(when$day) <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")

## Default behavior:
model.matrix(~day, when)


mainEffects <- dummyVars(~ day + time, data = when)
mainEffects
predict(mainEffects, when[1:3,])

interactionModel <- dummyVars(~ day + time + day:time,
                              data = when,
                              sep = ".")
predict(interactionModel, when[1:3,])

noNames <- dummyVars(~ day + time + day:time,
                     data = when,
                     levelsOnly = TRUE)
predict(noNames, when)

Run the code above in your browser using DataLab