dpm
This R package implements the dynamic panel data modeling framework described by Allison, Williams, and Moral-Benito (2017). This approach allows fitting models with fixed effects that do not assume strict exogeneity of predictors. That means you can simultaneously get the robustness to confounding offered by fixed effects models and account for reciprocal causation between the predictors and the outcome variable. The estimating approach from Allison et al. provides better finite sample performance in terms of both bias and efficiency than other popular methods (e.g., the Arellano-Bond estimator).
These models are fit using structural equation models, using maximum
likelihood estimation and offering the missing data handling and
flexibility afforded by SEM. This package will reshape your data,
specify the model properly, and fit it with lavaan
.
If a result doesn’t seem right, it would be a good idea to
cross-reference it with xtdpdml
for Stata. Go to
https://www3.nd.edu/~rwilliam/dynamic/ to learn about xtdpdml
and
the underlying method. You may also be interested in the article by Paul
Allison, Richard Williams, and Enrique Moral-Benito in Socius,
accessible
here.
Installation
dpm
will soon be on CRAN. In the meantime, you can get it from Github.
install.packages("devtools")
devtools::install_github("jacob-long/dpm")
Usage
This package assumes your data are in long format, with each row
representing a single observation of a single participant. Contrast this
with wide format in which each row contains all observations of a
single participant. For help on converting data from wide to long
format, check out the
tutorial that
accompanies the panelr
package.
First we load the package and the WageData
from panelr
.
library(dpm)
data("WageData", package = "panelr")
This next line of code converts the data to class panel_data
, which is
a class specific to the panelr
that helps to simplify the treatment of the long-form panel data. You
don’t have to do this, but it saves you from providing id
and wave
arguments to the model fitting function each time you use it.
wages <- panel_data(WageData, id = id, wave = t)
Basic formula syntax
The formula syntax used in this package is meant to be as similar to a typical regression model as possible.
The most basic model can be specified like any other: y ~ x
, where y
is the dependent variable and x
is a time-varying predictor. If you
would like to include time-invariant predictors, you will make the
formula consist of two parts, separated with a bar (|
) like so:
y ~ x | z
where z is a time invariant predictor, like ethnicity.
One of the innovations of the method, however, is the notion of
pre-determined, or sequentially exogenous, predictors. To specify a
model with a pre-determined variable, put the variable within a pre
function, y ~ pre(x1) + x2 | z
. This tells the function that x1
is
pre-determined while x2
is strictly exogenous by assumption. You could
have multiple pre-determined predictors as well (e.g.,
y ~ pre(x1) + pre(x2) | z
).
You may also fit models with lagged predictors. Simply apply the lag
function to the lagged predictors in the formula:
y ~ pre(lag(x1)) + lag(x2) | z
. To specify more than 1 lag, just
provide it as an argument. For instance,
y ~ pre(lag(x1, 2)) + lag(x2) | z
will use 2 lags of the x1
variable.
Socius article example
This will replicate the analysis of the wages data in the Socius article that describes these models.
Note that to get matching standard errors, set
information = "observed"
to override lavaan
’s default,
information = "expected"
.
fit <- dpm(wks ~ pre(lag(union)) + lag(lwage) | ed, data = wages,
error.inv = TRUE, information = "observed")
summary(fit)
MODEL INFO:
Dependent variable: wks
Total observations: 595
Complete observations: 595
Time periods: 2 - 7
MODEL FIT: