The purpose of check_dag()
is to build, check and visualize
your model based on directed acyclic graphs (DAG). The function checks if a
model is correctly adjusted for identifying specific relationships of
variables, especially directed (maybe also "causal") effects for given
exposures on an outcome. In case of incorrect adjustments, the function
suggests the minimal required variables that should be adjusted for (sometimes
also called "controlled for"), i.e. variables that at least need to be
included in the model. Depending on the goal of the analysis, it is still
possible to add more variables to the model than just the minimally required
adjustment sets.
check_dag()
is a convenient wrapper around ggdag::dagify()
,
dagitty::adjustmentSets()
and dagitty::adjustedNodes()
to check correct
adjustment sets. It returns a dagitty object that can be visualized with
plot()
. as.dag()
is a small convenient function to return the
dagitty-string, which can be used for the online-tool from the
dagitty-website.
check_dag(
...,
outcome = NULL,
exposure = NULL,
adjusted = NULL,
latent = NULL,
effect = c("all", "total", "direct"),
coords = NULL
)as.dag(x, ...)
An object of class check_dag
, which can be visualized with plot()
.
The returned object also inherits from class dagitty
and thus can be used
with all functions from the ggdag and dagitty packages.
One or more formulas, which are converted into dagitty syntax.
First element may also be model object. If a model objects is provided, its
formula is used as first formula, and all independent variables will be used
for the adjusted
argument. See 'Details' and 'Examples'.
Name of the dependent variable (outcome), as character string
or as formula. Must be a valid name from the formulas provided in ...
. If
not set, the first dependent variable from the formulas is used.
Name of the exposure variable (as character string or
formula), for which the direct and total causal effect on the outcome
should be checked. Must be a valid name from the formulas provided in ...
.
If not set, the first independent variable from the formulas is used.
A character vector or formula with names of variables that
are adjusted for in the model, e.g. adjusted = c("x1", "x2")
or
adjusted = ~ x1 + x2
. If a model object is provided in ...
, any values in
adjusted
will be overwritten by the model's independent variables.
A character vector with names of latent variables in the model.
Character string, indicating which effect to check. Can be
"all"
(default), "total"
, or "direct"
.
Coordinates of the variables when plotting the DAG. The coordinates can be provided in three different ways:
a list with two elements, x
and y
, which both are named vectors of
numerics. The names correspond to the variable names in the DAG, and the
values for x
and y
indicate the x/y coordinates in the plot.
a list with elements that correspond to the variables in the DAG. Each element is a numeric vector of length two with x- and y-coordinate.
a data frame with three columns: x
, y
and name
(which contains the
variable names).
See 'Examples'.
An object of class check_dag
, as returned by check_dag()
.
The formulas have following syntax:
One-directed paths: On the left-hand-side is the name of the variables
where causal effects point to (direction of the arrows, in dagitty-language).
On the right-hand-side are all variables where causal effects are assumed
to come from. For example, the formula Y ~ X1 + X2
, paths directed from
both X1
and X2
to Y
are assumed.
Bi-directed paths: Use ~~
to indicate bi-directed paths. For example,
Y ~~ X
indicates that the path between Y
and X
is bi-directed, and
the arrow points in both directions. Bi-directed paths often indicate
unmeasured cause, or unmeasured confounding, of the two involved variables.
The function checks if the model is correctly adjusted for identifying the direct and total effects of the exposure on the outcome. If the model is correctly specified, no adjustment is needed to estimate the direct effect. If the model is not correctly specified, the function suggests the minimally required variables that should be adjusted for. The function distinguishes between direct and total effects, and checks if the model is correctly adjusted for both. If the model is cyclic, the function stops and suggests to remove cycles from the model.
Note that it sometimes could be necessary to try out different combinations
of suggested adjustments, because check_dag()
can not always detect whether
at least one of several variables is required, or whether adjustments should
be done for all listed variables. It can be useful to copy the dagitty-code
(using as.dag()
, which prints the dagitty-string into the console) into
the dagitty-website and play around with different adjustments.
The direct effect of an exposure on an outcome is the effect that is not mediated by any other variable in the model. The total effect is the sum of the direct and indirect effects. The function checks if the model is correctly adjusted for identifying the direct and total effects of the exposure on the outcome.
Correctly thinking about and identifying the relationships between variables is important when it comes to reporting coefficients from regression models that mutually adjust for "confounders" or include covariates. Different coefficients might have different interpretations, depending on their relationship to other variables in the model. Sometimes, a regression coefficient represents the direct effect of an exposure on an outcome, but sometimes it must be interpreted as total effect, due to the involvement of mediating effects. This problem is also called "Table 2 fallacy" (Westreich and Greenland 2013). DAG helps visualizing and thereby focusing the relationships of variables in a regression model to detect missing adjustments or over-adjustment.
Rohrer, J. M. (2018). Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science, 1(1), 27–42. tools:::Rd_expr_doi("10.1177/2515245917745629")
Westreich, D., & Greenland, S. (2013). The Table 2 Fallacy: Presenting and Interpreting Confounder and Modifier Coefficients. American Journal of Epidemiology, 177(4), 292–298. tools:::Rd_expr_doi("10.1093/aje/kws412")