sparsereg: Sparse regression for experimental and observational data.

Description

Function for fitting a Bayesian LASSOplus model for sparse models with uncertainty, facilitating the discovery of various types of interactions. Function takes a dependent variable, an optional matrix of (pre-treatment) covariates, and a (optional) matrix of categorical treatment variables. Includes correct calculation of uncertainty estimates, including for data with repeated observations.

Usage

sparsereg(y, X, treat=NULL, EM=FALSE, gibbs=200, burnin=200, thin=10,  
type="linear", scale.type="none", baseline.vec=NULL, 
id=NULL, id2=NULL, id3=NULL, save.temp=FALSE, conservative=TRUE)

Arguments

Dependent variable.

Covariates. Typical vocabulary would refer to these as "pre-treatment" covariates.

treat

Matrix of categorical treatment variables. May be a matrix with one column in the case of there being only one treatment variable.

Whether to fit model via EM or MCMC. EM is much quicker, but only returns point estimates. MCMC is slower, but returns posterior intervals and approximate confidence intervals.

gibbs

Number of posterior samples to save. Between each saved sample, thin samples are drawn.

burnin

Number of burnin samples. Between each burnin sample, thin samples are drawn. These iterations will not be included in the resulting analysis.

thin

Extent of thinning of the MCMC chain. Between each posterior sample, whether burnin or saved, thin draws are made.

type

Type of regression model to fit. Allowed types are linear or probit.

baseline.vec

Optional vector with one entry for each column of the treatment matrix. Each entry gives the baseline condition for that treatment, which then during pre-processing is omitted for estimation so it serves as an excluded category.

id, id2, id3

Vectors the same lenght of the sample denoting clustering in the data. In a conjoint experiment with repeated observations, these correspond with respondent IDs. Up to three different sets of random effects are allowed.

scale.type

Indicates the types of interactions that will be created and used in estimation. scale.type="none" generates no interactions and corresponds to simply running LASSOplus with no interactions between variables. scale.type="TX" creates interactions between each X variable and each level of the treatment variables. scale.type="TT" creates interactions between each level of separate treatment variables. scale.type="TTX" interacts each X variable with all values generated by scale.type="TT". Note that users can create their own interactions of interest, select scale.type="none", to return the sparse version of the user specified model.

save.temp

Whether to save intermediate output in a file named temp_sparsereg. Useful for very long runs.

conservative

Experimental. If set to FALSE, the estimate is less conservative in selecting a variable.

Value

beta.mode: Matrix of sparse (mode) estimates with rows equal to number of effects and columns for posterior samples.
beta.mean: Matrix of mean estimates with rows equal to number of effects and columns for posterior samples. These estimates are not sparse, but they do predict better than the mode.
beta.ci: Matrix of effects used to calculate approximate confidence intervals.
sigma.sq: Vector of posterior estimate of error variance.
X: Matrix of covariates fit. Includes interaction terms, depending on scale.type.
varmat: Matrix of showing which lower-order terms correspond with which effects. Used in producing figures.
baseline: Vector of baseline categories for treatments.
modeltype: Type of sparsereg model fit. In this case, onestage. Used by summary functions.

Details

The function sparsereg allows for estimation of a broad range of sparse regressions. The method allows for continuous, binary, and censored outcomes. In experimental data, it can be used for subgroup analysis. It pre-processes lower-order terms to generate higher-order interactions terms that are uncorrelated with their lower order component, with pre-processing generated through scale.type. In observational data, it can be used in place of a standard regression, especially in the presence of a large number of variables. The method also adjusts uncertainty estimates when there are repeated observations through using random effects. For example, a conjoint design may have the same people make several comparisons, or a panel data regression may have multiple observations on the same unit.

The object contains the estimated posterior for all of the modeled effects, and analyzing the object is facilitated by the functions plot, summary, violinplot, and difference.

References

Ratkovic, Marc and Tingley, Dustin. 2015. "Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Design." Working paper.

Egami, Naoki and Imai, Kosuke. 2015. "Causal Interaction in High-Dimension." Working paper.

Examples

Run this code


## Not run: 
#  set.seed(1)
#  n<-500
#  k<-5
#  treat<-sample(c("a","b","c"),n,replace=TRUE,pr=c(.5,.25,.25))
#  treat2<-sample(c("a","b","c","d"),n,replace=TRUE,pr=c(.25,.25,.25,.25))
#  Sigma<-diag(k)
#  Sigma[Sigma==0]<-.5
#  X<-mvrnorm(n,m=rep(0,k),S=Sigma)
#  y.true<-3+X[,2]*2+(treat=="a")*2 +(treat=="b")*(-2)+X[,2]*(treat=="b")*(-2)+
#   X[,2]*(treat2=="c")*2
#  y<-y.true+rnorm(n,sd=2)
# 
# ##Fit a linear model.
# s1<-sparsereg(y, X, cbind(treat,treat2), scale.type="TX")
# s1.EM<-sparsereg(y, X, cbind(treat,treat2), EM=TRUE, scale.type="TX")
# 
# ##Summarize results from MCMC fit
# summary(s1)
# plot(s1)
# violinplot(s1)
# 
# ##Summarize results from MCMC fit
# summary(s1.EM)
# plot(s1.EM)
# 
# ##Extension using a baseline category
# s1.base<-sparsereg(y, X, treat, scale.type="TX", baseline.vec="a")
# 
# summary(s1.base)
# plot(s1.base)
# violinplot(s1.base)
# 
# ## End(Not run)

Run the code above in your browser using DataLab