cj: Simple Conjoint Analyses and Visualization

Description

Simple analyses of conjoint (factorial) experiments and visualization of results.

Usage

cj(
  data,
  formula,
  id = ~0,
  weights = NULL,
  estimate = c("amce", "frequencies", "mm", "amce_differences", "mm_differences"),
  feature_order = NULL,
  feature_labels = NULL,
  level_order = c("ascending", "descending"),
  by = NULL,
  ...
)

Arguments

data

A data frame containing variables specified in formula. All RHS variables should be factors; the base level for each will be used in estimation and for AMCEs the base level's AMCE will be zero. Optionally, this can instead be an object of class “survey.design” returned by svydesign.

formula

A formula specifying a model to be estimated. ; all levels across features should be unique. For estimate = "amce" in a constrained conjoint design, two-way interactions can be specified to handle constraints between factors in the design. These are detected automatically. Higher-order constraints are not allowed and interactions are ignored for all other values of estimate as constraints are irrelevant to those statistics.

An RHS formula specifying a variable holding respondent identifiers, to be used for clustering standard errors.

weights

An (optional) RHS formula specifying a variable holding survey weights.

estimate

A character string specifying an estimate type. Current options are average marginal component effects (or AMCEs, “amce”, estimated via amce), display frequencies (“frequncies”, estimated via cj_freqs), marginal means (or AMMs, “mm”, estimated via mm), differences in MMs (“mm_differences”, via mm_diffs), or differences in AMCEs (“amce_differences”, via amce_diffs). Additional options may be made available in the future. Non-ambiguous abbreviations are allowed.

feature_order

An (optional) character vector specifying the names of feature (RHS) variables in the order they should be encoded in the resulting data frame.

feature_labels

A named list of “fancy” feature labels to be used in output. By default, the function looks for a “label” attribute on each variable in formula and uses that for pretty printing. This argument overrides those attributes or otherwise provides fancy labels for this purpose. This should be a list with names equal to variables on the righthand side of formula and character string values; arguments passed here override variable attributes.

level_order

A character string specifying levels (within each feature) should be ordered increasing or decreasing in the final output. This is mostly only consequential for plotting via plot.cj_mm, etc.

A formula containing only RHS variables, specifying grouping factors over which to perform estimation.

…

Additional arguments to amce, cj_freqs, mm, mm_diffs, or amce_diffs.

Value

A data frame with special class to facilitate plotting (e.g., “cj_amce”, “cj_mm”, etc.)

Details

The main function cj is a convenience function wrapper around the underlying estimation functions that provide for average marginal component effects (AMCEs), by default, via the amce function, marginal means (MMs) via the mm function, and display frequencies via cj_freqs and cj_props. Additional estimands may be supported in the future through their own functions and through the cj interface. Plotting is provided via ggplot2 for all types of estimates.

The only additional functionality provided by cj over the underlying functions is the by argument, which will perform operations on subsets of data, returning a single data frame. This can be useful, for example, for evaluating profile spillover effects and subgroup results, or in any situation where one might be inclined to use a for loop or lapply, calling cj repeatedly on subgroups.

Note: Some features of cregg (namely, the amce_diffs) function, or estimate = "amce_diff" here) only work with full factorial conjoint experiments. Designs involving two-way constraints between features are supported simply by expressing interactions between constrained terms in formula (again, except for amce_diffs). Higher-order constraints may be supported in the future.

Examples

Run this code

# NOT RUN {
# load data
requireNamespace("ggplot2")
data("immigration")
immigration$contest_no <- factor(immigration$contest_no)
data("taxes")

# calculate MMs
f1 <- ChosenImmigrant ~ Gender + Education + 
         LanguageSkills + CountryOfOrigin + Job + JobExperience + 
         JobPlans + ReasonForApplication + PriorEntry
d1 <- cj(immigration, f1, id = ~ CaseID, estimate = "mm", h0 = 0.5)
# plot MMs
plot(d1, vline = 0.5)

# calculate MMs for survey-weighted data
d1 <- cj(taxes, chose_plan ~ taxrate1 + taxrate2 + taxrate3 +
         taxrate4 + taxrate5 + taxrate6 + taxrev, id = ~ ID,
         weights = ~ weight, estimate = "mm", h0 = 0.5)
# plot MMs
plot(d1, vline = 0.5)

# MMs split by profile number
stacked <- cj(immigration, f1, id = ~ CaseID,
              estimate = "mm", by = ~ contest_no)

## plot with grouping
plot(stacked, group = "contest_no", vline = 0.5, feature_headers = FALSE)

## plot with facetting
plot(stacked) + ggplot2::facet_wrap(~ contest_no, nrow = 1L)

# estimate AMCEs
d2 <- cj(immigration, f1, id = ~ CaseID)

# plot AMCEs
plot(d2)

## subgroup analysis
immigration$ethnosplit <- cut(immigration$ethnocentrism, 2)
x <- cj(na.omit(immigration), ChosenImmigrant ~ Gender + Education + LanguageSkills,
        id = ~ CaseID, estimate = "mm", h0 = 0.5, by = ~ ethnosplit)
plot(x, group = "ethnosplit", vline = 0.5)

# combinations of/interactions between features
immigration$language_entry <- 
  interaction(immigration$LanguageSkills, immigration$PriorEntry, sep = "_")

## higher-order MMs for feature combinations
cj(immigration, ChosenImmigrant ~ language_entry,
   id = ~CaseID, estimate = "mm", h0 = 0.5)

## constrained designs
## in a constrained design, some cells are unobserved:
subset(cj_props(immigration, ~ Job + Education), Proportion == 0)
## MMs and AMCEs only use data from observed cells
## In `immigraation`, this means while the MM for `Job == "Janitor"` is an average 
## across all levels of Education:
mm(subset(immigration, Job == "Janitor"), ChosenImmigrant ~ Education)
## the MM for `Job == "Doctor"` is an average across only 3 levels of education:
mm(subset(immigration, Job == "Doctor"), ChosenImmigrant ~ Education)
## Use `cj_props()` to see constraints:
subset(cj_props(immigration, ~ Job + Education), Job == "Doctor" & Proportion != 0)

## Substantively, the MM of "Doctor" might be higher than other levels of `Job`
## this could be due to the feature itself or due to the fact that it is constrained
## with a different subset of other feature levels than alternative levels of `Job`
## this may mean analysts want to report MMs (or AMCEs) only for the unconstrained levels:
elev <- c("Two-Year College", "College Degree", "Graduate Degree")
jlev <- c("Financial Analyst", "Computer Programmer", "Research Scientist", "Doctor")
mm(subset(immigration, Education %in% elev), ChosenImmigrant ~ Job)
mm(subset(immigration, Job %in% jlev), ChosenImmigrant ~ Education)
## or, present estimates excluding constrained levels:
mm(subset(immigration, !Education %in% elev), ChosenImmigrant ~ Job)
mm(subset(immigration, !Job %in% jlev), ChosenImmigrant ~ Education)
# }

Run the code above in your browser using DataLab