dummy.code: Create dummy coded variables

Description

Given a variable x with n distinct values, create n new dummy coded variables coded 0/1 for presence (1) or absence (0) of each variable. A typical application would be to create dummy coded college majors from a vector of college majors. Can also combine categories by group. By default, NA values of x are returned as NA (added 10/20/17)

Usage

dummy.code(x,group=NULL,na.rm=TRUE,top=NULL,min=NULL)

Value

A matrix of dummy coded variables

Arguments

x: A vector to be transformed into dummy codes
group: A vector of categories to be coded as 1, all others coded as 0.
na.rm: If TRUE, return NA for all codes with NA in x
top: If specified, then just dummy code the top values, and make the rest NA
min: If specified, then dummy code all values >= min

Author

William Revelle

Details

When coding demographic information, it is typical to create one variable with multiple categorical values (e.g., ethnicity, college major, occupation). dummy.code will convert these categories into n distinct dummy coded variables.

If there are many possible values (e.g., country in the SAPA data set) then specifying top will assign dummy codes to just a subset of the data.

If using dummy coded variables as predictors, remember to use n-1 variables.

If group is specified, then all values of x that are in group are given the value of 1, otherwise, 0. (Useful for combining a range of science majors into STEM or not. The example forms a dummy code of any smoking at all.)

Examples

Run this code

new <- dummy.code(sat.act$education)
new.sat <- data.frame(new,sat.act)
round(cor(new.sat,use="pairwise"),2)
#dum.smoke <- dummy.code(spi$smoke,group=2:9)
#table(dum.smoke,spi$smoke)
#dum.age <- dummy.code(round(spi$age/5)*5,top=5)  #the most frequent five year blocks

Run the code above in your browser using DataLab