Learn R Programming

FindIt (version 1.2.0)

CausalANOVA: Estimating the AMEs and AMIEs with the CausalANOVA.

Description

CausalANOVA estimates coefficients of the specified ANOVA with regularization. By taking differences in coefficients, the function recovers the AMEs and AMIEs.

Usage

CausalANOVA(
  formula,
  int2.formula = NULL,
  int3.formula = NULL,
  data,
  nway = 1,
  pair.id = NULL,
  diff = FALSE,
  screen = FALSE,
  screen.type = "fixed",
  screen.num.int = 3,
  collapse = FALSE,
  collapse.type = "fixed",
  collapse.cost = 0.3,
  family = "binomial",
  cluster = NULL,
  maxIter = 50,
  eps = 1e-05,
  fac.level = NULL,
  ord.fac = NULL,
  select.prob = FALSE,
  boot = 100,
  seed = 1234,
  verbose = TRUE
)

Arguments

formula

A formula that specifies outcome and treatment variables.

int2.formula

(optional). A formula that specifies two-way interactions.

int3.formula

(optional). A formula that specifies three-way interactions.

data

An optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environment from which 'CausalANOVA' is called.

nway

With nway=1, the function estimates the Average Marginal Effects (AMEs) only. With nway=2, the function estimates the AMEs and the two-way Average Marginal Interaction Effects (AMIEs). With nway=3, the function estimates the AMEs, the two-way and three-way AMIEs. Default is 1.

pair.id

(optional).Unique identifiers for each pair of comparison. This option is used when diff=TRUE.

diff

A logical indicating whether the outcome is the choice between a pair. If diff=TRUE, pair.id should specify a pair of comparison. Default is FALSE.

screen

A logical indicating whether select significant factor interactions with glinternet. When users specify interactions using int2.formula or int3.formula, this option is ignored. screen should be used only when users want data-driven selection of factor-interactions. With screen.type, users can specify how to screen factor interactions. We recommend to use this option when the number of factors is large, e.g., more than 6. Default is FALSE.

screen.type

Type for screening factor interactions. (1) "fixed" select the fixed number (specified by screen.num.int) of factor interactions. (2) "cv.min" selects factor-interactions with the tuning parameter giving the minimum cross-validation error. (3) "cv.1Std" selects factor-interactions with the tuning parameter giving a cross-validation error that is within 1 standard deviation of the minimum cv error.

screen.num.int

(optional).The number of factor interactions to select. This option is used when and screen=TRUE and screen.type="fixed". Default is 3.

collapse

A logical indicating whether to collapse insignificant levels within factors. With collapse.type, users can specify how to collapse levels within factors. We recommend to use this option when the number of levels is large, e.g., more than 6. Default is FALSE.

collapse.type

Type for collapsing levels within factors. (1) "fixed" collapses levels with the fixed cost parameter (specified by collapse.cost). (2) "cv.min" collapses levels with the cost parameter giving the minimum cross-validation error. This option might take time. (3) "cv.1Std" collapses with the cost parameter giving a cross-validation error that is within 1 standard deviation of the minimum cv error. This option might take time.

collapse.cost

(optional).A cost parameter ranging from 0 to 1. 1 corresponds to no collapsing. The closer to 0, the stronger regularization. Default is 0.3.

family

A family of outcome variables. "gaussian" when continuous outcomes "binomial" when binary outcomes. Default is "binomial".

cluster

Unique identifies with which cluster standard errors are computed.

maxIter

The number of maximum iteration for glinternet.

eps

A tolerance parameter in the internal optimization algorithm.

fac.level

(optional). A vector containing the number of levels in each factor. The order of fac.level should match to the order of columns in the data. For example, when the first and second columns of the design matrix is "Education" and "Race", the first and second element of fac.level should be the number of levels in "Education" and "Race", respectively.

ord.fac

(optional). Logical vectors indicating whether each factor has ordered (TRUE) or unordered (FALSE) levels. When levels are ordered, the function uses the order given by function levels(). If levels are ordered, the function places penalties on the differences between adjacent levels. If levels are unordered, the function places penalties on the differences based on every pairwise comparison.

select.prob

(optional). A logical indicating whether selection probabilities are computed. This option might take time.

boot

The number of bootstrap replicates for select.prob. Default is 50.

seed

Seed for bootstrap.

verbose

Whether it prints the value of a cost parameter used.

Value

intercept

An intercept of the estimated ANOVA model.If diff=TRUE, this should be close to 0.5.

formula

The formula used in the function.

coefs

A named vector of coefficients of the estimated ANOVA model.

vcov

The variance-covariance matrix for coefs. Only when select=FALSE and collapse=FALSE.

CI.table

The summary of AMEs and AMIEs with confidence intervals. Only when select=FALSE and collapse=FALSE.

AME

The estimated AMEs with the grand-mean as baselines.

AMIE2

The estimated two-way AMIEs with the grand-mean as baselines.

AMIE3

The estimated three-way AMIEs with the grand-mean as baselines.

...

arguments passed to the function or arguments only for the internal use.

Details

Regularization: screen and collapse.

Users can implement regularization in order to reduces false discovery rate and facilitates interpretation. This is particularly useful when analyzing factorial experiments with a large number of factors, each having many levels.

  • When screen=TRUE, the function selects significant factor interactions with glinternet (Lim and Hastie 2015) before estimating the AMEs and AMIEs. This option is recommended when there are many factors, e.g., more than 6 factors. Alternatively, users can pre-specify interactions of interest using int2.formula and int3.formula.

  • When collapse=TRUE, the function collapses insignificant levels within each factor by GashANOVA (Post and Bondell 2013) before estimating the AMEs and AMIEs. This option is recommended when there are many levels within some factors, e.g., more than 6 levels.

Inference after Regularization:

  • When screen=TRUE or collapse=TRUE, in order to make valid inference after regularization, we recommend to use test.CausalANOVA function. It takes the output from CausalANOVA function and estimate the AMEs and AMIEs with newdata and provide confidence intervals. Ideally, users should split samples into two; use a half for regularization with CausalANOVA function and use the other half for inference with test.CausalANOVA.

  • If users do not need regularization, specify screen=FALSE and collapse=FALSE. The function estimates the AMEs and AMIEs and compute confidence intervals with the full sample.

Suggested Workflow: (See Examples below as well)

  1. Specify the order of levels within each factor using levels(). When collapse=TRUE, the function places penalties on the differences between adjacent levels when levels are ordered, it is crucial to specify the order of levels within each factor carefully.

  2. Run CausalANOVA.

    1. Specify formula to indicate outcomes and treatment variables and nway to indicate the order of interactions.

    2. Specify diff=TRUE and pair.id if the outcome is the choice between a pair.

    3. Specify screen. screen=TRUE to implement data-driven selection of factor interactions. screen=FALSE to specify interactions through int2.formula and int3.formula by hand.

    4. Specify collapse. collapse=TRUE to implement data-driven collapsing of insignificant levels. collapse=FALSE to use the original number of levels.

  3. Run test.CausalANOVA when select=TRUE or collapse=TRUE.

  4. Run summary and plot to explore the AMEs and AMIEs.

  5. Estimate conditional effects using ConditionalEffect function and visualize them using plot function.

References

Egami, Naoki and Kosuke Imai. 2019. Causal Interaction in Factorial Experiments: Application to Conjoint Analysis, Journal of the American Statistical Association. http://imai.fas.harvard.edu/research/files/int.pdf

Lim, M. and Hastie, T. 2015. Learning interactions via hierarchical group-lasso regularization. Journal of Computational and Graphical Statistics 24, 3, 627--654.

Post, J. B. and Bondell, H. D. 2013. Factor selection and structural identification in the interaction anova model. Biometrics 69, 1, 70--79.

See Also

cv.CausalANOVA

Examples

Run this code
# NOT RUN {
data(Carlson)
## Specify the order of each factor
Carlson$newRecordF<- factor(Carlson$newRecordF,ordered=TRUE,
                            levels=c("YesLC", "YesDis","YesMP",
                                     "noLC","noDis","noMP","noBusi"))
Carlson$promise <- factor(Carlson$promise,ordered=TRUE,levels=c("jobs","clinic","education"))
Carlson$coeth_voting <- factor(Carlson$coeth_voting,ordered=FALSE,levels=c("0","1"))
Carlson$relevantdegree <- factor(Carlson$relevantdegree,ordered=FALSE,levels=c("0","1"))

## ####################################### 
## Without Screening and Collapsing
## ####################################### 
#################### only AMEs ####################
fit1 <- CausalANOVA(formula=won ~ newRecordF + promise + coeth_voting + relevantdegree,
                    data=Carlson, pair.id=Carlson$contestresp, diff=TRUE,
                    cluster=Carlson$respcodeS, nway=1)
summary(fit1)
plot(fit1)

#################### AMEs and two-way AMIEs ####################
fit2 <- CausalANOVA(formula=won ~ newRecordF + promise + coeth_voting + relevantdegree,
                    int2.formula = ~ newRecordF:coeth_voting,
                    data=Carlson, pair.id=Carlson$contestresp,diff=TRUE,
                    cluster=Carlson$respcodeS, nway=2)
summary(fit2)
plot(fit2, type="ConditionalEffect", fac.name=c("newRecordF","coeth_voting"))
ConditionalEffect(fit2, treat.fac="newRecordF", cond.fac="coeth_voting")

# }
# NOT RUN {
#################### AMEs and two-way and three-way AMIEs ####################
## Note: All pairs within thee-way interactions should show up in int2.formula (Strong Hierarchy).
fit3 <- CausalANOVA(formula=won ~ newRecordF + promise + coeth_voting + relevantdegree,
                    int2.formula = ~ newRecordF:promise + newRecordF:coeth_voting
                                       + promise:coeth_voting,
                    int3.formula = ~ newRecordF:promise:coeth_voting,
                    data=Carlson, pair.id=Carlson$contestresp,diff=TRUE,
                    cluster=Carlson$respcodeS, nway=3)
summary(fit3)
plot(fit3, type="AMIE", fac.name=c("newRecordF","promise", "coeth_voting"),space=25,adj.p=2.2)
# }
# NOT RUN {
## ####################################### 
## With Screening and Collapsing
## #######################################
## Sample Splitting
train.ind <- sample(unique(Carlson$respcodeS), 272, replace=FALSE)
test.ind <- setdiff(unique(Carlson$respcodeS), train.ind)
Carlson.train <- Carlson[is.element(Carlson$respcodeS,train.ind), ]
Carlson.test <- Carlson[is.element(Carlson$respcodeS,test.ind), ]
 
#################### AMEs and two-way AMIEs ####################
fit.r2 <- CausalANOVA(formula=won ~ newRecordF + promise + coeth_voting + relevantdegree,
                      data=Carlson.train, pair.id=Carlson.train$contestresp,diff=TRUE,
                      screen=TRUE, collapse=TRUE,
                      cluster=Carlson.train$respcodeS, nway=2)
summary(fit.r2)

## refit with test.CausalANOVA
fit.r2.new <- test.CausalANOVA(fit.r2, newdata=Carlson.test, diff=TRUE,
                               pair.id=Carlson.test$contestresp, cluster=Carlson.test$respcodeS)

summary(fit.r2.new)
plot(fit.r2.new)
plot(fit.r2.new, type="ConditionalEffect", fac.name=c("newRecordF","coeth_voting"))
ConditionalEffect(fit.r2.new, treat.fac="newRecordF", cond.fac="coeth_voting")

# }

Run the code above in your browser using DataLab