dummy.code: Create dummy coded variables

Description

Given a variable x with n distinct values, create n new dummy coded variables coded 0/1 for presence (1) or absence (0) of each variable. A typical application would be to create dummy coded college majors from a vector of college majors. Can also combine categories by group. By default, NA values of x are returned as NA (added 10/20/17)

Usage

dummy.code(x,group=NULL,na.rm=TRUE,top=NULL,min=NULL)

Arguments

A vector to be transformed into dummy codes

group

A vector of categories to be coded as 1, all others coded as 0.

na.rm

If TRUE, return NA for all codes with NA in x

top

If specified, then just dummy code the top values, and make the rest NA

min

If specified, then dummy code all values >= min

Value

A matrix of dummy coded variables

Details

When coding demographic information, it is typical to create one variable with multiple categorical values (e.g., ethnicity, college major, occupation). dummy.code will convert these categories into n distinct dummy coded variables.

If there are many possible values (e.g., country in the SAPA data set) then specifying top will assign dummy codes to just a subset of the data.

If using dummy coded variables as predictors, remember to use n-1 variables.

If group is specified, then all values of x that are in group are given the value of 1, otherwise, 0. (Useful for combining a range of science majors into STEM or not. The example forms a dummy code of any smoking at all.)

Examples

Run this code

# NOT RUN {
new <- dummy.code(sat.act$education)
new.sat <- data.frame(new,sat.act)
round(cor(new.sat,use="pairwise"),2)
#dum.smoke <- dummy.code(spi$smoke,group=2:9)
#table(dum.smoke,spi$smoke)
#dum.age <- dummy.code(round(spi$age/5)*5,top=5)  #the most frequent five year blocks
# }

Run the code above in your browser using DataLab