effectFusion: Bayesian effect fusion for categorical predictors

Description

This function performs Bayesian variable selection and effect fusion for categorical predictors in linear and logistic regression models. Effect fusion aims at the question which categories of an ordinal or nominal predictor have a similar effect on the response and therefore can be fused to obtain a sparser representation of the model. Effect fusion and variable selection can be obtained either with a prior that has an interpretation as spike and slab prior on the level effect differences or with a sparse finite mixture prior on the level effects. The regression coefficients are estimated with a flat uninformative prior after model selection or by using model averaged results. For posterior inference, a MCMC sampling scheme is used that involves only Gibbs sampling steps. The sampling schemes for linear and logistic regression are almost identical as in the case of logistic regression a data augmentation strategy (Polson et al. (2013)) is used that requires only one additional step to sample from the Polya-Gamma distribution.

Usage

effectFusion(
  y,
  X,
  types,
  method,
  prior = list(),
  mcmc = list(),
  mcmcRefit = list(),
  family = "gaussian",
  modelSelection = "binder",
  returnBurnin = FALSE
)

Arguments

a vector of the response observations (continuous if family = 'gaussian' and 0/1 if family = 'binomial')

a data frame with covariates, each column representing one covariate. Ordinal and nominal covariates should be of class factor

types

a character vector to specify the type of each covariate; 'c' indicates continuous or metric predictors, 'o' ordinal predictors and 'n' nominal predictors

method

controls the main prior structure that is used for effect fusion. For the prior that has an interpretation as spike and slab prior on the level effect differences choose method = 'SpikeSlab' and for the sparse finite mixture prior on the level effects choose method = 'FinMix'. See details for a description of the two approaches and their advantages and drawbacks. For comparison purposes it is also possible to fit a full model without performing any effect fusion (method = NULL)

prior

an (optional) list of prior settings and hyper-parameters for the prior (see details). The specification of this list depends on the chosen method and the selected family

mcmc

an (optional) list of MCMC sampling options (see details)

mcmcRefit

an (optional) list of MCMC sampling options for the refit of the selected model (see details)

family

indicates whether linear (default, family = 'gaussian') or logistic regression (family = 'binomial') should be performed

modelSelection

if modelSelection = 'binder' the final model is selected by minimising the expected posterior binder's loss using an algorithm of Lau and Green (2008) for the spike and slab model and an algorithm of Rastelli and Friel (2016) for the finite mixture approach. Alternatively, modelSelection = 'pam' can be specified for the sparse finite mixture model. In that case, the final model is selected by using pam clustering and the silhouette coefficient (see Malsiner-Walli et al., 2018 for details). If modelSelection is NULL no final model selection is performed and parameter estimates are model averaged results. modelSelection = 'binder' is the default value. modelSelection = 'pam' is only available for method = 'FinMix'. If method = 'SpikeSlab' and modelSelection = 'pam', modelSelection is automatically set to 'binder'. For the finite mixture approach we recommend to use modelSelection = 'binder', as this algorithm provides - in contrast to pam clustering and the the silhouette coefficient - the opportunity to exclude whole covariates.

returnBurnin

if TRUE (default is FALSE) the burn-in iterations of the MCMC sampling process are returned as well. This can be for example used to check convergence. Returning the burn-in does not influence the results of dic, model, plot, print and summary.

Value

The function returns an object of class fusion with methods dic, model, print, summary and plot.

An object of class fusion is a named list containing the following elements:

fit

beta

regression coefficients \(\beta_0\) (intercept) and \(\beta\)

delta

indicator variable \(\delta\) for slab component when method = 'SpikeSlab'. The differences of the level effects are assigned either to the spike (delta = 0) or the slab component (delta = 1). If an effect difference is assigned to the spike component, the difference is almost zero and the corresponding level effects should be fused.

tau2

variance \(\tau^2\) of slab component when method = 'SpikeSlab'. If no hyperprior on \(\tau^2\) is specified, tau2 contains the fixed values for \(\tau^2\).

S

latent allocation variable \(S\) for mixture components when method = 'FinMix'

eta

mixture component weights \(\eta\) when method = 'FinMix'

eta0

weights of components located at zero \(\eta_0\) when method = 'FinMix'

mu

mixture component means \(\mu\) when method = 'FinMix'

sgma2

error variance \(\sigma^2\) of the model (only for family = 'gaussian')

fit_burnin

refit

beta

sgma2

family = 'gaussian')

X_dummy_fused

model

method

see arguments

family

see arguments

data

model

a named list containing information on the full, initial model

categories

number of categories for categorical predictors

diff

number of pairwise level effect differences

n_cont

number of metric predictors

n_ord

number of ordinal predictors

n_nom

number of nominal predictors

prior

see details for prior

mcmc

see details for mcmc

mcmcRefit

see details for mcmcRefit

modelSelection

see arguments

returnBurnin

see arguments

numbCoef

number of estimated regression coefficients (based on the refitted model if effect fusion and final model selection is performed, otherwise based on model averaged results or the full model, respectively)

call

function call

Details

This function provides identification of categories (of ordinal and nominal predictors) with the same effect on the response and their automatic fusion.

Two different prior versions for effect fusion and variable selection are available. The first prior version allows a priori for almost perfect as well as almost zero dependence between level effects. This prior has also an interpretation as independent spike and slab prior on all pairwise differences of level effects and correction for the linear dependence of the effect differences. Even though the prior is mainly designed for fusion of level effects, excluding some categories from the model as well as the whole covariate (variable selection) can also be easily accomplished. Excluding a category from the model corresponds to fusion of this category to the baseline and excluding the whole covariate consequently to fusion of all categories to the baseline.

The second prior is a modification of the usual spike and slab prior for the regression coefficients by combining a spike at zero with a finite location mixture of normal components. It enables detection of categories with similar effects on the response by clustering the regression effects. Categories with effects that are allocated to the same cluster are fused. Due to the specification with one component located at zero also automatic exclusion of levels and whole covariates without any effect on the outcome is provided. However, when using modelSelection = 'pam', it is not possible to exclude whole covariates as the Silhouette coefficient does not allow for one cluster solutions. Therefore, we recommend to use modelSelection = 'binder'.

In settings with large numbers of categories, we recommend to use the sparse finite mixture prior for computational reasons. It is important to note that the sparse finite mixture prior on the level effects does not take into account the ordering information of ordinal predictors and treats them like nominal predictors, whereas in the spike and slab case fusion is restricted to adjacent categories for ordinal predictors.

Metric predictors can be included in the model as well and variable selection will be performed also for these predictors.

If modelSelection = NULL, no final model selection is performed and model averaged results are returned. When modelSelection = 'binder' the final model is selected by minimizing the expected posterior Binder loss for each covariate separately. Additionally, there is a second option for the finite mixture prior (modelSelection = 'pam') which performs model selection by identifying the optimal partition of the effects using PAM clustering and the silhouette coefficient.

For comparison purposes it is also possible to fit a full model instead of performing effect fusion (method = NULL). All other functions provided in this package, such as dic or summary, do also work for the full model.

Details for the model specification (see arguments):

prior

r

variance ratio of slab to spike component; default to 50000 if family = 'gaussian' and 5000000 if family = 'binomial'. r should be chosen not too small but still small enough to avoid stickiness of MCMC. We recommend a value of at least 20000.

g0

shape parameter of inverse gamma distribution on \(\tau^2\) when tau2_fix = NULL and method = 'SpikeSlab'; default to 5. The default value is a standard choice in variable selection where the tails of spike and slab component are not too thin to cause mixing problems in MCMC.

G0

scale parameter of inverse gamma distribution on \(\tau^2\) when tau2_fix = NULL and method = 'SpikeSlab'; default to 25. G0 controls to some extend the sparsity of the model. Smaller values for G0 help to detect also small level effect differences, but result in less fusion of categories.

tau2_fix

If tau2_fix = NULL, an inverse gamma hyper-prior is specified on \(\tau^2\). However, the value of the slab variance can also be fixed for each covariate instead of using a hyperprior. tau2_fix is only of interest if method = 'SpikeSlab'. Default to NULL. Similar to the scale parameter G0, the fixed variance of the slab component tau2_fix can control to some extend the sparsity.

e0

parameter of Dirichlet hyper-prior on mixture weights when method = 'FinMix'; default to 0.01. e0 should be chosen smaller than 1 in order to encourage empty components. Small values such as 0.01 help to concentrate the model space on sparse solutions.

p

prior parameter to control mixture component variances when method = 'FinMix'; values between 100 and 100000 led to good results in simulation studies; default to 100 if family = 'gaussian' and 1000 if family = 'binomial'. We recommend to try different values for this prior parameter and compare the models using dic. When hyperprior = FALSE, larger values of p lead to less sparsity and it should be chosen not smaller than 100. If a hyper-prior on the mixture component variances is used (hyperprior = TRUE), p has almost no effect on the sparsity of the model and it should again be not smaller than 100.

hyperprior

logical value if inverse gamma hyper-prior on component variance should be specified when method = 'FinMix'; default to FALSE. The hyper-prior leads to robust results concerning the specification of p but also to very sparse solutions.

s0

hyper-parameter (shape) of inverse gamma distribution on error variance, used for both versions of method, but only for family = 'gaussian'; default to 0.

S0

hyper-parameter (scale) of inverse gamma distribution on error variance, used for both versions of method, but only for family = 'gaussian'; default to 0.

mcmc

M

number of MCMC iterations after the burn-in phase; default to 20000 for effect fusion models and 3000 for full models.

burnin

number of MCMC iterations discarded as burn-in; default to 5000 for effect fusion models and 1000 for full models.

startsel

number of MCMC iterations drawn from the model without performing effect fusion; default to 1000 for effect fusion models and 0 for full models.

mcmcRefit

M_refit

number of MCMC iterations after the burn-in phase for the refit of the selected model; default to 3000.

burnin_refit

number of MCMC iterations discarded as burn-in for the refit of the selected model; default to 1000.

References

Pauger, D., and Wagner, H. (2019). Bayesian Effect Fusion for Categorical Predictors. Bayesian Analysis, 14(2), 341-369. 10.1214/18-BA1096

Malsiner-Walli, G., Pauger, D., and Wagner, H. (2018). Effect Fusion Using Model-Based Clustering. Statistical Modelling, 18(2), 175-196. 10.1177/1471082X17739058

Polson, N.G., Scott, J.G., and Windle, J. (2013). Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association, 108(504), 1339-1349. 10.1080/01621459.2013.829001

Examples

Run this code

# NOT RUN {
# ----------- Load simulated data set 'sim1' for linear regression
data(sim1)
y = sim1$y
X = sim1$X
types = sim1$types

# ----------- Bayesian effect fusion for simulated data set with spike and slab prior
m1 <- effectFusion(y, X, types, method = 'SpikeSlab')

# print, summarize and plot results
print(m1)
summary(m1)
plot(m1)

# evaluate model and model criteria
model(m1)
dic(m1)

# ----------- Use finite mixture prior for comparison
m2 <- effectFusion(y, X, types, method = 'FinMix')

# summarize and plot results
print(m2)
summary(m2)
plot(m2)
model(m2)
dic(m2)

# change prior parameter specification
m3 <- effectFusion(y, X, types, prior= list(p = 10^3), method = 'FinMix')
plot(m3)

# ------------  Use model averaged coefficient estimates
m4 <- effectFusion(y, X, types, method = 'SpikeSlab', modelSelection = NULL)
summary(m4)

# ------------  Estimate full model for comparison purposes
m5 <- effectFusion(y, X, types, method = NULL)
summary(m5)
plot(m5)
dic(m5)

# ----------- Load simulated data set 'sim3' for logistic regression
data(sim3)
y = sim3$y
X = sim3$X
types = sim3$types

# ----------- Bayesian effect fusion for simulated data set with finite mixture prior
m6 <- effectFusion(y, X, types, method = 'FinMix', prior = list(p = 10^4), family = 'binomial')

# look at the results
print(m6)
summary(m6)
plot(m6)
model(m6)
dic(m6)

# ----------- Use spike and slab prior for comparison
m7 <- effectFusion(y, X, types, method = 'SpikeSlab', family = 'binomial', returnBurnin = TRUE)

# summarize and evaluate results
print(m7)
summary(m7)
plot(m7)
model(m7)
dic(m7)
# }

Run the code above in your browser using DataLab