Learn R Programming

gmnl (version 1.1-3.2)

gmnl: Estimate Multinomial Logit Models with Observed and Unobserved Individual Heterogeneity.

Description

Estimate different types of multinomial logit models with observed and unobserved individual heterogneity, such as MIXL, S-MNL, G-MNL, LC and MM-MNL models. These models are estimated using Maximum Simulated Likelihood. It supports both cross-sectional and panel data.

Usage

gmnl(
  formula,
  data,
  subset,
  weights,
  na.action,
  model = c("mnl", "mixl", "smnl", "gmnl", "lc", "mm"),
  start = NULL,
  ranp = NULL,
  R = 40,
  Q = 2,
  haltons = NA,
  mvar = NULL,
  seed = 12345,
  correlation = FALSE,
  bound.err = 2,
  panel = FALSE,
  hgamma = c("direct", "indirect"),
  reflevel = NULL,
  init.tau = 0.1,
  init.gamma = 0.1,
  notscale = NULL,
  print.init = FALSE,
  gradient = TRUE,
  typeR = TRUE,
  ...
)

# S3 method for gmnl print( x, digits = max(3, getOption("digits") - 3), width = getOption("width"), ... )

# S3 method for gmnl summary(object, ...)

# S3 method for summary.gmnl print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... )

# S3 method for gmnl update(object, new, ...)

# S3 method for gmnl coef(object, ...)

# S3 method for gmnl model.matrix(object, ...)

# S3 method for gmnl residuals(object, outcome = TRUE, ...)

# S3 method for gmnl df.residual(object, ...)

# S3 method for gmnl fitted(object, outcome = TRUE, ...)

# S3 method for gmnl logLik(object, ...)

# S3 method for gmnl nObs(x, ...)

Arguments

formula

a symbolic description of the model to be estimated. The formula is divided in five parts, each of them separated by the symbol |. The first part is reserved for alternative-specific variables with a generic coefficient. The second part corresponds to individual-specific variables with an alternative specific coefficients. The third part corresponds to alternative-specific variables with an alternative-specific coefficident. The fourth part is reserved for time-invariant variables that modify the mean of the random parameters. Finally, the fifth part is reserved for time-invariant variables that enter in the scale coefficient or in the probability assignment in models with latent classes.

data

the data of class mlogit.data.

subset

an optional vector specifying a subset of observations.

weights

an optional vector of weights. Default to 1.

na.action

a function wich indicated what should happen when the data contains NA's.

model

a string indicating which model is estimated. The options are "mnl" for the Multinomial Logit Model, "mixl" for the Mixed Logit Model, "smnl" for the Scaled Multinomial Logit Model, "gmnl" for the Generalized Multinomial Logit Model, "lc" for the Latent Class Multinomial Logit Model, and "mm" for the Mixed-Mixed Multinomial Logit Model.

start

a vector of starting values.

ranp

a named vector whose names are the random parameters and values the distribution: "n" for normal, "ln" for log-normal, "cn" for truncated normal, "u" for uniform, "t" for triangular, "sb" for Sb Johnson.

R

the number of draws of pseudo-random numbers if ranp is not NULL.

Q

number of classes for LC or MM-MNL models.

haltons

only relevant if ranp is not NULL. If haltons = NULL, pseudo-random numbers are used instead of Halton sequences. If haltons=NA, the first \(K\) primes are used to generates the Halton draws, where \(K\) is the number of random parameters, and 15 of the initial sequence of elements are dropped. Otherwise, haltons should be a list with elements prime and drop.

mvar

only valid if ranp is not NULL. This is a named list, where the names correspond to the variables with random parameters, and the values correspond to the variables that enter in the mean of each random parameters.

seed

seed for the random number generator. Default is seed = 12345.

correlation

only relevant if ranp is not NULL. If true, the correlation across random parameters is taken into account.

bound.err

only relevenat if model is smnl or gmnl. It indicates at which values the draws for the scale parameter are truncated. By default bound.err = 2, therefore a truncated normal distribution with truncation at -2 and +2 is used.

panel

if TRUE a panel data model is estimated.

hgamma

a string indicated how to estimate the parameter gamma. If "direct", then \(\gamma\) is estimated directly, if "indirect" then \(\gamma ^*\) is estimated, where \(\gamma = \exp(\gamma^*)/(1 + \exp(\gamma^*))\).

reflevel

the base alternative.

init.tau

initial value for the \(\tau\) parameter.

init.gamma

initial value for \(\gamma\).

notscale

only relevant if model is smnl or gmnl. It is a vector indicating which variables should not be scaled.

print.init

if TRUE, the initial values for the optimization procedure are printed.

gradient

if TRUE, analytical gradients are used for the optimization procedure.

typeR

if TRUE, truncated normal draws are used for the scale parameter, if FALSE the procedure suggested by Greene (2010) is used.

...

additional arguments to be passed to maxLik, which depend in the maximization routine.

x, object

and object of class gmnl.

digits

the number of digits.

width

width.

new

an updated formula for the update method.

outcome

if TRUE, then the fitted and residuals methods return a vector that corresponds to the chosen alternative, otherwise it returns a matrix where each column corresponds to each alternative.

Value

An object of class ``gmnl'' with the following elements

coefficients

the named vector of coefficients,

logLik

a set of values of the maximum likelihood procedure,

mf

the model framed used,

formula

the formula (a gFormula object),

time

proc.time() minus the start time,

freq

frequency of dependent variable,

draws

type of draws used,

model

the fitted model,

R

number of draws used,

ranp

vector indicating the variables with random parameters and their distribution,

residuals

the residuals,

correlation

whether the model is fitted assuming that the random parameters are correlated,

bi

matrix of conditional expectation of random parameters,

Q

number of classes,

call

the matched call.

Details

Let the utility to person \(i\) from choosing alternative \(j\) on choice occasion \(t\) be: $$U_{ijt} = \beta_{i}x_{ijt} + \epsilon_{ijt}$$ where \(\epsilon_{ijt}\) is i.i.d extreme value, and \(\beta_i\) vary across individuals. Each model estimated by gmnl depends on how \(\beta_i\) is specified. The options are the following:

  1. S-MNL if \(\beta_i=\sigma_i\beta\), where the scale \(\sigma_i\) varies across individuals.

  2. MIXL if \(\beta_i=\beta + s\eta_i\), where \(\eta_i\) is a draw from some distribution. For example, if \(\beta_i\sim N(\beta, s^2)\), then \(\eta_i\sim N(0, 1)\).

  3. GMNL if \(\beta_i=\sigma_i\beta + \gamma s\eta_i + \sigma_i(1-\gamma)s\eta_i\), where \(\sigma_i\) is the scale parameter, and \(\gamma\) is a parameter that controls how the variance of residual taste heterogeneity varies with scale.

  4. LC if \(\beta_i=\beta_q\) with probability \(w_{iq}\) for \(q = 1,...,Q\), where \(Q\) is the total number of classes.

  5. MM-MIXL if \(\beta_i\sim f(\beta_q, \Sigma_q)\) with probability \(w_{iq}\) for \(q = 1,...,Q\), where \(Q\) is the total number of classes.

Observed heterogeneity can be also accommodated in the random parameters when the MIXL is estimated by including individual-specific covariates. Specifically, the vector of random coefficients is $$\beta_i=\beta +\Pi z_i + L\eta_i$$ where \(z_i\) is a set of characteristics of individual \(i\) that influence the mean of the taste parameters; and \(\Pi\) is matrix of parameters. To estimate this model, the fourth part of the formula should be specified along with the mvar argument.

One can also allow the mean of the scale to differ across individuals by including individual-specific characteristics. Thus, the scale parameters can be written as $$\exp(\bar{\sigma} + \delta h_i + \tau \upsilon_i)$$ where \(h_i\) is a vector of attributes of individual \(i\). To estimate this model, the fifth part of the formula should include the variables that enter \(h_i\).

For models with latent classes, the class assignment is modeled as a semi-parametric multinomial logit format $$w_{iq}= \frac{\exp(\gamma_q)}{\sum_{q=1}^Q\exp(\gamma_q)}$$ for \(q = 1,...,Q, \gamma_1 = 0\). Latent class models (LC and MM-MIXL) requires at least that a constant should be specified in the fifth part of the formula. If the class assignment, \(w_{iq}\), is also determined by socio-economic characteristics, these variables can be also included in the fifth part.

Models that involve random parameters are estimated using Maximum Simulated Likelihood using the maxLik function of maxLik package.

References

  • Keane, M., & Wasi, N. (2013). Comparing alternative models of heterogeneity in consumer choice behavior. Journal of Applied Econometrics, 28(6), 1018-1045.

  • Fiebig, D. G., Keane, M. P., Louviere, J., & Wasi, N. (2010). The generalized multinomial logit model: accounting for scale and coefficient heterogeneity. Marketing Science, 29(3), 393-421.

  • Greene, W. H., & Hensher, D. A. (2010). Does scale heterogeneity across individuals matter? An empirical assessment of alternative logit models. Transportation, 37(3), 413-428.

  • Train, K. (2009). Discrete choice methods with simulation. Cambridge University Press.

See Also

mlogit, mlogit.data, maxLik, Rchoice

Examples

Run this code
# NOT RUN {
## Examples using the Fishing data set from the AER package
data("TravelMode", package = "AER")
library(mlogit)
TM <- mlogit.data(TravelMode, choice = "choice", shape = "long", 
                 alt.levels = c("air", "train", "bus", "car"), chid.var = "individual")
# }
# NOT RUN {
## S-MNL model, ASCs not scaled
smnl <- gmnl(choice ~ wait + vcost + travel + gcost| 1, data = TM, 
             model = "smnl", R = 100, 
             notscale = c(1, 1, 1, rep(0, 4)))
summary(smnl)

## MIXL model with observed heterogeneity
mixl.hier <- gmnl(choice ~ vcost + gcost + travel + wait | 1 | 0 | income + size - 1,
                 data = TM,
                 model = "mixl",
                 ranp = c(travel = "t", wait = "n"),
                 mvar = list(travel = c("income","size"), wait = c("income")),
                 R = 30,
                 haltons = list("primes"= c(2, 17), "drop" = rep(19, 2)))
summary(mixl.hier)

## Examples using the Electricity data set from the mlogit package
data("Electricity", package = "mlogit")
Electr <- mlogit.data(Electricity, id.var = "id", choice = "choice",
                     varying = 3:26, shape = "wide", sep = "")
                     
## Estimate a MIXL model with correlated random parameters
Elec.cor <- gmnl(choice ~ pf + cl + loc + wk + tod + seas| 0, data = Electr,
                 subset = 1:3000,
                 model = 'mixl',
                 R = 10,
                 panel = TRUE,
                 ranp = c(cl = "n", loc = "n", wk = "n", tod = "n", seas = "n"),
                 correlation = TRUE)
summary(Elec.cor)
cov.gmnl(Elec.cor)
se.cov.gmnl(Elec.cor)
se.cov.gmnl(Elec.cor, sd = TRUE)
cor.gmnl(Elec.cor)

## Estimate a G-MNL model, where ASCs are also random
Electr$asc2 <- as.numeric(Electr$alt == 2)
Electr$asc3 <- as.numeric(Electr$alt == 3)
Electr$asc4 <- as.numeric(Electr$alt == 4)

Elec.gmnl <- gmnl(choice ~ pf + cl + loc + wk + tod + seas + asc2 + asc3 + asc4 | 0,
                 data = Electr,
                 subset = 1:3000,
                 model = 'gmnl',
                 R = 30,
                 panel = TRUE,
                 notscale = c(rep(0, 6), 1, 1, 1),
                 ranp = c(cl = "n", loc = "n", wk = "n", tod = "n", seas = "n",
                 asc2 = "n", asc3 = "n", asc4 = "n"))
summary(Elec.gmnl)

## Estimate a LC model with 2 classes
Elec.lc <- gmnl(choice ~ pf + cl + loc + wk + tod + seas| 0 | 0 | 0 | 1,
               data = Electr,
               subset = 1:3000,
               model = 'lc',
               panel = TRUE,
               Q = 2)
summary(Elec.lc)

## Estimate a MM-MIXL model
Elec.mm <- gmnl(choice ~ pf + cl + loc + wk + tod + seas| 0 | 0 | 0 | 1,
                 data = Electr,
                 subset = 1:3000,
                 model = 'mm',
                 R = 30,
                 panel = TRUE,
                 ranp = c(pf = "n", cl = "n", loc = "n", wk = "n", tod = "n",
                 seas = "n"),
                 Q = 2,
                 iterlim = 500)
summary(Elec.mm)
# }

Run the code above in your browser using DataLab