The default is to draw samples from the posterior distribution (analytic = FALSE
). The samples are
required for computing edge differences (see ggm_compare_estimate
), Bayesian R2 introduced in
gelman_r2_2019;textualBGGM (see predictability
), etc. If the goal is
to *only* determine the non-zero effects, this can be accomplished by setting analytic = TRUE
.
This is particularly useful when a fast solution is needed (see the examples in ggm_compare_ppc
)
Controlling for Variables:
When controlling for variables, it is assumed that Y
includes only
the nodes in the GGM and the control variables. Internally, only
the predictors
that are included in formula
are removed from Y
. This is not behavior of, say,
lm
, but was adopted to ensure users do not have to write out each variable that
should be included in the GGM. An example is provided below.
Mixed Type:
The term "mixed" is somewhat of a misnomer, because the method can be used for data including only
continuous or only discrete variables. This is based on the ranked likelihood which requires sampling
the ranks for each variable (i.e., the data is not merely transformed to ranks). This is computationally
expensive when there are many levels. For example, with continuous data, there are as many ranks
as data points!
The option mixed_type
allows the user to determine which variable should be treated as ranks
and the "emprical" distribution is used otherwise hoff2007extendingBGGM. This is
accomplished by specifying an indicator vector of length p. A one indicates to use the ranks,
whereas a zero indicates to "ignore" that variable. By default all integer variables are treated as ranks.
Dealing with Errors:
An error is most likely to arise when type = "ordinal"
. The are two common errors (although still rare):
The first is due to sampling the thresholds, especially when the data is heavily skewed.
This can result in an ill-defined matrix. If this occurs, we recommend to first try
decreasing prior_sd
(i.e., a more informative prior). If that does not work, then
change the data type to type = mixed
which then estimates a copula GGM
(this method can be used for data containing only ordinal variable). This should
work without a problem.
The second is due to how the ordinal data are categorized. For example, if the error states
that the index is out of bounds, this indicates that the first category is a zero. This is not allowed, as
the first category must be one. This is addressed by adding one (e.g., Y + 1
) to the data matrix.
Imputing Missing Values:
Missing values are imputed with the approach described in hoff2009first;textualBGGM.
The basic idea is to impute the missing values with the respective posterior pedictive distribution,
given the observed data, as the model is being estimated. Note that the default is TRUE
,
but this ignored when there are no missing values. If set to FALSE
, and there are missing
values, list-wise deletion is performed with na.omit
.