Using the methodology of Bayesian Model Averaging in the BMA package, the variable selection problem is applied to multinomial logit models in which coefficients can be estimated relative to a base alternative.
bic.mlogit(f, data, choices = NULL, base.choice = 1,
varying = NULL, sep = ".", approx=TRUE,
include.intercepts = TRUE, verbose = FALSE, ...)
The function returns an object of class bic.mlogit
containing the following components:
Object of class bic.glm
which results from applying BMA on the binary logistic data.
List with results from the mlogit2logit
function.
Object of class mnl.spec
containing the MNL specification of the full model.
List of objects of class mnl.spec
containing specifications for each selected model.
Value of the approx
argument.
Formula as described in Details of mnl.spec
.
Data frame containing the variables of the model. There should be one record for each individual. Alternative-specific variables occupy single column per alternative.
Vector of names of alternatives. If it is not given, it is determined from the response column of the data frame. Values of this vector should match or be a subset of those in the response column. If it is a subset, data
is reduced to contain only observations whose choice is contained in choices
.
Index of the base alternative within the vector choices
.
Indices of variables within data
that are alternative-specific.
Separator of variable name and alternative name in the ‘varying’ variables.
Logical. If TRUE
, the function uses approximate likelihoods as they come out of the Begg & Gray approximation. If FALSE
, the MNL maximum likelihood estimation is used in the last step of the model selection procedure. Note that this can significantly increase the run-time, see Details below.
Logical controlling if alternative specific constants should always be included in the selected models. It only has an effect if the formula f
contains the intercept, i.e. it does not contain ‘-1’. See Details below.
Logical switching log messages on and off.
Additional arguments passed to the bic.glm
function of the BMA package.
Hana Sevcikova, Adrian Raftery
The function converts the given multinomial data into a combination of binary logistic data, as proposed in Yeung et al. (2005). It requires that the model can be specified as a set of equations of which one is considered as the base equation. If variables are included that vary over alternatives, they are normalized by subtracting the values corresponding to the base alternative. Details of the conversion algorithm are described in the vignette of this package, see vignette('conversion')
.
The function then applies the bic.glm
function of the BMA package on the converted data by using the Begg & Gray (1984) approximation. In the last step of the variable selection procedure, if approx
is FALSE
, the maximum likelihood estimation (MLE) is applied to all selected models and the Bayesian Information Criterium (BIC) is recomputed using the log-likelihood of the full multinomial logistic regression model. Note that this step can be computationally very expensive. We suggest when using this option, set the verbose
argument to TRUE
to follow the computation progress. Note that one can use the estimate.mlogit
function on the resulting object which performs the MLE on selected models only.
The BMA functions always include the intercept which in the MNL settings corresponds to the alternative specific constant (asc) of the second alternative (relative to the base alternative). If include.intercepts=TRUE
(default), asc for all the remaining alternatives are also always included in the selected models. If it is set to FALSE
, the asc of the remaining alternatives (i.e. third and higher) are treated as ordinary variables, i.e candidates for selection as well as exclusion.
Begg, C.B., Gray, R. (1984) Calculation of polychotomous logistic regression parameters using individualized regressions. Biometrika 71, 11--18.
Yeung, K.Y., Bumgarner, R.E., Raftery, A.E. (2005) Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21 (10), 2394--2402.
bic.glm
, summary.bic.mlogit
, imageplot.mlogit
, estimate.mlogit
.
data('heating')
res <- bic.mlogit(depvar ~ ic + oc + income + rooms, heating, choices=1:5,
varying=3:12, verbose=TRUE, approx=FALSE, sep='')
summary(res)
imageplot.mlogit(res)
plot(res)
# use approximate BMA and estimate the models afterwards
res <- bic.mlogit(depvar ~ ic + oc | income + rooms, heating, choices=1:5,
varying=3:12, verbose=TRUE, approx=TRUE, sep='')
summary(res)
estimate.mlogit(res, heating)
Run the code above in your browser using DataLab