Learn R Programming

catspec (version 0.93)

mclgen: Restructure a data-frame as a person-choice file

Description

mclgen restructures a data-frame into a person-choice file for estimation of a multinomial logit as a conditional logit model

Usage

mclgen(datamat, catvar)

Arguments

datamat
A data-frame to be transformed into a person-choice file
catvar
A factor representing the response variable (i.e. the dependent variable in a multinomial logistic model)

Value

  • A data-frame is returned, restructured as a person-choice file.

Details

A multinomial logit model can be estimated using a program for conditional logit regression. This will produce the same coefficients and standard errors but allows greater flexibility for imposing restrictions on the dependent variable. To estimate the multinomial logistic model as a conditional logit model, the data must be restructured as a person-choice file. mclgen performs this operation such that:
  • each record of the data-frame is duplicatedncattimes, wherencatis the number of categories of the response variable
  • A new variableidis created to index respondents. This variable is used as the stratifying variable inclogit
  • A new variablenewyis created to index response options for each respondent
  • A new variabledepvaris created which is equal to 1 for the record corresponding with the respondent's actual choice and is 0 otherwise
depvar is the dependent variable in clogit. The main effects of catvar correspond with the intercept term of a multinomial logit model, interactions of catvar with predictor variables correspond with the effects of these variables in a multinomial logit model. Since catvar is now on the right-hand side of the model equation, restrictions can be imposed in the usual fashion. For example, by using [MASS]contr.sdif in the MASS package for catvar, an adjacent logit model is obtained (Agresti 1990: 318). By adding the dummy variables for two categories of catvar, an equality constraint can be imposed on those categories. These equality constraints can then be imposed on the effects of some predictor variables but not others. Another use is to include a mobility model in a multinomial logistic regression model. Mobility models are loglinear models for square tables. They lie in the space between a model of independence and a saturated model. This is accomplished by imposing restrictions on the interaction effect of the row and column variable. A number of these special models have been developed, see Hout (1983) or Goodman (1984) for an overview. These loglinear mobility models can be seen as multinomial logistic regression models with special restrictions on the dependent variable. The nature of the restriction depends on the category of the predictor variable. In practise, mobility models can be included in an MCL model using the same specification as for a loglinear model. Rfunctions for several common mobility models can be found in sqtab.

References

Agresti, Alan. (1990). Categorical data analysis. New York: John Wiley & Sons. Allison, Paul D. and Nicholas Christakis. (1994). Logit models for sets of ranked items. Pp. 199-228 in Peter V. Marsden (ed.), Sociological Methodology. Oxford: Basil Blackwell. Breen, Richard. (1994). Individual Level Models for Mobility Tables and Other Cross- Classifications. Sociological Methods & Research 33: 147-173. Goodman, Leo A. (1984). The analysis of cross-classified data having ordered categories. Cambridge, Mass.: Harvard University Press. Hendrickx, John. (2000). Special restrictions in multinomial logistic regression. Stata Technical Bulletin 56: 18-26. Hendrickx, John, Ganzeboom, Harry B.G. (1998). Occupational Status Attainment in the Netherlands, 1920-1990. A Multinomial Logistic Analysis. European Sociological Review 14: 387-403. Hout, Michael. (1983). Mobility Tables. Sage Publication 07-031. Logan, John A. (1983). A Multivariate Model for Mobility Tables. American Journal of Sociology 89: 324-349. http://www.xs4all.nl/~jhckx/

See Also

[survival]clogit, [survival]coxph, [nnet]multinom, sqtab

Examples

Run this code
## Example 1
# data from the Data from the 1972-78 GSS used by Logan (1983)
data(logan)

# create the "person-choice" file
pc<-mclgen(logan,occ)
summary(pc)
attach(pc)

library(survival)
# The following specification will work but R won't drop
# cl.lr<-clogit(depvar~occ+occ:educ+occ:black+strata(id),data=pc)
# However, R won't drop the first category of "occ"
# in the interaction effects. The last category will be omitted
# instead due to linear dependence within strata.
# Fix for the problem, create dummies manually for "occ"
occ.X<-model.matrix(~pc$occ)
occ.X<-occ.X[,attributes(occ.X)$assign==1]
cl.lr<-clogit(depvar~occ.X+occ.X:educ+occ.X:black+strata(id),data=pc)
summary(cl.lr)

# Estimate a "quasi-uniform association" loglinear model for "focc" and "occ"
# with "educ" and "black" as covariates at the respondent level
cl.qu<-clogit(depvar~occ.X+occ.X:educ+occ.X:black+
  mob.qi(focc,occ)+mob.unif(focc,occ)+strata(id),data=pc)
summary(cl.qu)

data(housing,package="MASS")
housing.prsch<-mclgen(housing,Sat)
library(survival)
# clogit doesn't support the weights argument at present
# a work-around is to call coxph directly
# coxph warns that X is singular, because the main
# effects of Infl, Type, and Cont are dropped
coxph.prsch<-coxph(Surv(rep(1, NROW(housing.prsch)), depvar) ~
  Sat+Sat*Infl+Sat*Type+Sat*Cont+strata(id),
  weights = housing.prsch$Freq, data = housing.prsch)
summary(coxph.prsch)

# the same model using multinomial logistic regression
library(nnet)
house.mult<- multinom(Sat ~ Infl + Type + Cont, weights = Freq,
                      data = housing)
summary(house.mult,correlation=FALSE)

# compare the coefficients
m1<-coef(coxph.prsch)
m1<-m1[!is.na(m1)]
dim(m1)<-c(2,7)
m2<-coef(house.mult)
m1
m2
m1-m2
max(abs(m1-m2))
mean(abs(m1-m2))

Run the code above in your browser using DataLab