Learn R Programming

sparsereg (version 1.2)

sparsereg: Sparse regression for experimental and observational data.

Description

Function for fitting a Bayesian LASSOplus model for sparse models with uncertainty, facilitating the discovery of various types of interactions. Function takes a dependent variable, an optional matrix of (pre-treatment) covariates, and a (optional) matrix of categorical treatment variables. Includes correct calculation of uncertainty estimates, including for data with repeated observations.

Usage

sparsereg(y, X, treat=NULL, EM=FALSE, gibbs=200, burnin=200, thin=10, type="linear", scale.type="none", baseline.vec=NULL, id=NULL, id2=NULL, id3=NULL, save.temp=FALSE, conservative=TRUE)

Arguments

y
Dependent variable.
X
Covariates. Typical vocabulary would refer to these as "pre-treatment" covariates.
treat
Matrix of categorical treatment variables. May be a matrix with one column in the case of there being only one treatment variable.
EM
Whether to fit model via EM or MCMC. EM is much quicker, but only returns point estimates. MCMC is slower, but returns posterior intervals and approximate confidence intervals.
gibbs
Number of posterior samples to save. Between each saved sample, thin samples are drawn.
burnin
Number of burnin samples. Between each burnin sample, thin samples are drawn. These iterations will not be included in the resulting analysis.
thin
Extent of thinning of the MCMC chain. Between each posterior sample, whether burnin or saved, thin draws are made.
type
Type of regression model to fit. Allowed types are linear or probit.
baseline.vec
Optional vector with one entry for each column of the treatment matrix. Each entry gives the baseline condition for that treatment, which then during pre-processing is omitted for estimation so it serves as an excluded category.
id, id2, id3
Vectors the same lenght of the sample denoting clustering in the data. In a conjoint experiment with repeated observations, these correspond with respondent IDs. Up to three different sets of random effects are allowed.
scale.type
Indicates the types of interactions that will be created and used in estimation. scale.type="none" generates no interactions and corresponds to simply running LASSOplus with no interactions between variables. scale.type="TX" creates interactions between each X variable and each level of the treatment variables. scale.type="TT" creates interactions between each level of separate treatment variables. scale.type="TTX" interacts each X variable with all values generated by scale.type="TT". Note that users can create their own interactions of interest, select scale.type="none", to return the sparse version of the user specified model.
save.temp
Whether to save intermediate output in a file named temp_sparsereg. Useful for very long runs.
conservative
Experimental. If set to FALSE, the estimate is less conservative in selecting a variable.

Value

beta.mode
Matrix of sparse (mode) estimates with rows equal to number of effects and columns for posterior samples.
beta.mean
Matrix of mean estimates with rows equal to number of effects and columns for posterior samples. These estimates are not sparse, but they do predict better than the mode.
beta.ci
Matrix of effects used to calculate approximate confidence intervals.
sigma.sq
Vector of posterior estimate of error variance.
X
Matrix of covariates fit. Includes interaction terms, depending on scale.type.
varmat
Matrix of showing which lower-order terms correspond with which effects. Used in producing figures.
baseline
Vector of baseline categories for treatments.
modeltype
Type of sparsereg model fit. In this case, onestage. Used by summary functions.

Details

The function sparsereg allows for estimation of a broad range of sparse regressions. The method allows for continuous, binary, and censored outcomes. In experimental data, it can be used for subgroup analysis. It pre-processes lower-order terms to generate higher-order interactions terms that are uncorrelated with their lower order component, with pre-processing generated through scale.type. In observational data, it can be used in place of a standard regression, especially in the presence of a large number of variables. The method also adjusts uncertainty estimates when there are repeated observations through using random effects. For example, a conjoint design may have the same people make several comparisons, or a panel data regression may have multiple observations on the same unit.

The object contains the estimated posterior for all of the modeled effects, and analyzing the object is facilitated by the functions plot, summary, violinplot, and difference.

References

Ratkovic, Marc and Tingley, Dustin. 2015. "Sparse Estimation with Uncertainty: Subgroup Analysis in Large Dimensional Design." Working paper.

Egami, Naoki and Imai, Kosuke. 2015. "Causal Interaction in High-Dimension." Working paper.

See Also

plot.sparsereg, summary.sparsereg, violinplot, difference, print.sparsereg

Examples

Run this code

## Not run: 
#  set.seed(1)
#  n<-500
#  k<-5
#  treat<-sample(c("a","b","c"),n,replace=TRUE,pr=c(.5,.25,.25))
#  treat2<-sample(c("a","b","c","d"),n,replace=TRUE,pr=c(.25,.25,.25,.25))
#  Sigma<-diag(k)
#  Sigma[Sigma==0]<-.5
#  X<-mvrnorm(n,m=rep(0,k),S=Sigma)
#  y.true<-3+X[,2]*2+(treat=="a")*2 +(treat=="b")*(-2)+X[,2]*(treat=="b")*(-2)+
#   X[,2]*(treat2=="c")*2
#  y<-y.true+rnorm(n,sd=2)
# 
# ##Fit a linear model.
# s1<-sparsereg(y, X, cbind(treat,treat2), scale.type="TX")
# s1.EM<-sparsereg(y, X, cbind(treat,treat2), EM=TRUE, scale.type="TX")
# 
# ##Summarize results from MCMC fit
# summary(s1)
# plot(s1)
# violinplot(s1)
# 
# ##Summarize results from MCMC fit
# summary(s1.EM)
# plot(s1.EM)
# 
# ##Extension using a baseline category
# s1.base<-sparsereg(y, X, treat, scale.type="TX", baseline.vec="a")
# 
# summary(s1.base)
# plot(s1.base)
# violinplot(s1.base)
# 
# ## End(Not run)

Run the code above in your browser using DataLab