This function is intended for use with large datasets with multiple group
effects of large cardinality. If dummy-encoding the group effects
results in a manageable number of coefficients, you are probably better
off by using lm
.
felm(formula, data, iv=NULL, clustervar=NULL, exactDOF=FALSE,
subset, na.action, contrasts=NULL, ...)
exactDOF=TRUE
causes felm
to attempt to compute it, buNA
s. The default is set by the
na.action
setting of options
, and is
na.fail
if that is unset. The 'factory-frcontrasts.arg
of model.matrix.default
.clustervar
and iv
arguments will be removed from the argument list at a
later time, but will continue to be supported in this field.
Currently, the only argument supported in this field is the 'logifelm
returns an object of class
"felm"
. It is
quite similar to an "lm"
object, but not entirely compatible. The generic summary
-method will yield a summary which may be print
'ed.
The object has some resemblance to the an lm
object, and some
postprocessing methods designed for lm
may happen to work. It
may however be necessary to coerce the object to succeed with this.
The "felm"
object is a list containing the following fields:
felm
' objects for the IV 1. step(s), if
used.felm(keepX=TRUE)
is specified. Must be included if
bccorr
is to be used for correcting limited mobility bias.y ~ x1 + x2 | f1 + f2 |
(Q|W ~ x3+x4) | clu1 + clu2
where y
is the response,
x1,x2
are ordinary covariates, f1,f2
are factors to be
projected out, Q
and W
are covariates which are
instrumented by x3
and x4
, and clu1,clu2
are
factors to be used for computing cluster robust standard errors.
Parts that are not used should be specified as 0
, except if it's
at the end of the formula, where they can be omitted. The parentheses
are needed in the third part since |
has higher precedence than ~
.Interactions between a covariate x
and a factor f
can be
projected out with the syntax x:f
.
The terms in the second and fourth parts are not treated as
ordinary formulas, in particular it is not possible with things like
y ~ x1 | x*f
, rather one would specify y ~ x1 + x | x:f + f
.
Note that f:x
also works, since R's parser does not keep the
order. This means that in interactions, the factor must be a
factor, whereas a non-interacted factor will be coerced to a
factor. I.e. in y ~ x1 | x:f1 + f2
, the f1
must be a
factor, whereas it will work as expected if f2
is an integer vector.
In older versions of felm(y ~ x1 + x2 + G(f1)
+ G(f2), iv=list(Q ~ x3+x4, W ~ x3+x4),
clustervar=c('clu1','clu2'))
. This syntax still works.
The standard errors are adjusted for the reduced degrees of freedom
coming from the dummies which are implicitly present. In the case of
two factors, the exact number of implicit dummies is easy to compute. If there
are more factors, the number of dummies is estimated by assuming there's
one reference-level for each factor, this may be a slight over-estimation,
leading to slightly too large standard errors. Setting exactDOF='rM'
computes the exact degrees of freedom with rankMatrix()
in package rankMatrix()
for sparse matrices which may cause it to return the wrong value. A fix is underway.
For the iv-part of the formula, it is only necessary to include the instruments on the
right hand side. The other explanatory covariates, from the first and
second part of formula
, are added automatically
in the first stage regressions. See the examples.
The contrasts
argument is similar to the one in lm()
, it
is used for factors in the first part of the formula. The factors in the
second part are analyzed as part of a possible subsequent getfe()
call.
The old syntax with a single part formula with the G()
syntax for the factors to transform
away is still supported, as well as the clustervar
and iv
arguments, but users are encouraged to move to the new multi part
formulas as described here. In an upcoming version of clustervar
and iv
arguments will be moved to the ...
argument list.
In the event that you use these arguments, and rewriting to the new syntax
is impractical, you should make sure to name them (i.e. not use them as
positional arguments). felm
will issue a warning if these two
arguments are not named.
Note that the way missing values (NAs) in IV estimations are handled in
An alternative to clustered standard errors is to project out the cluster factors (put them in the second part of the formula) and use heteroskedastic standard errors.
Note that the F-test which is computed by summary.felm
is
unreliable for robust standard errors.
getfe
summary.felm
oldopts <- options(lfe.threads=1)
## create covariates
x <- rnorm(1000)
x2 <- rnorm(length(x))
## individual and firm
id <- factor(sample(20,length(x),replace=TRUE))
firm <- factor(sample(13,length(x),replace=TRUE))
## effects for them
id.eff <- rnorm(nlevels(id))
firm.eff <- rnorm(nlevels(firm))
## left hand side
u <- rnorm(length(x))
y <- x + 0.5*x2 + id.eff[id] + firm.eff[firm] + u
## estimate and print result
est <- felm(y ~ x+x2| id + firm)
summary(est)
## compare with lm
summary(lm(y ~ x + x2 + id + firm-1))
# make an example with 'reverse causation'
# Q and W are instrumented by x3 and the factor x4. Report robust s.e.
x3 <- rnorm(length(x))
x4 <- sample(12,length(x),replace=TRUE)
Q <- 0.3*x3 + x + 0.2*x2 + id.eff[id] + 0.3*log(x4) - 0.3*y + rnorm(length(x),sd=0.3)
W <- 0.7*x3 - 2*x + 0.1*x2 - 0.7*id.eff[id] + 0.8*cos(x4) - 0.2*y+ rnorm(length(x),sd=0.6)
# add them to the outcome
y <- y + Q + W
ivest <- felm(y ~ x + x2 | id+firm | (Q|W ~x3|factor(x4)))
summary(ivest,robust=TRUE)
# compare with the not instrumented fit:
summary(felm(y ~ x + x2 +Q + W |id+firm))
options(oldopts)
Run the code above in your browser using DataLab