ggm_compare_ppc: GGM Compare: Posterior Predictive Check

Description

Compare GGMs with a posterior predicitve check gelman1996posteriorBGGM. This method was introduced in williams2020comparing;textualBGGM. Currently, there is a global (the entire GGM) and a nodewise test. The default is to compare GGMs with respect to the posterior predictive distribution of Kullback Leibler divergence and the sum of squared errors. It is also possible to compare the GGMs with a user defined test-statistic.

Usage

ggm_compare_ppc(
  ...,
  test = "global",
  iter = 5000,
  FUN = NULL,
  custom_obs = NULL,
  loss = TRUE,
  progress = TRUE
)

Value

The returned object of class ggm_compare_ppc contains a lot of information that is used for printing and plotting the results. For users of BGGM, the following are the useful objects:

test = "global"

ppp_jsd posterior predictive p-values (JSD).
ppp_sse posterior predictive p-values (SSE).
predictive_jsd list containing the posterior predictive distributions (JSD).
predictive_sse list containing the posterior predictive distributions (SSE).
obs_jsd list containing the observed error (JSD).
obs_sse list containing the observed error (SSE).

test = "nodewise"

ppp_jsd posterior predictive p-values (JSD).
predictive_jsd list containing the posterior predictive distributions (JSD).
obs_jsd list containing the observed error (JSD).

FUN = f()

ppp_custom posterior predictive p-values (custom).
predictive_custom posterior predictive distributions (custom).
obs_custom observed error (custom).

Arguments

...: At least two matrices (or data frames) of dimensions n (observations) by p (variables).
test: Which test should be performed (defaults to "global") ? The options include global and nodewise.
iter: Number of replicated datasets used to construct the predictivie distribution (defaults to 5000).
FUN: An optional function for comparing GGMs that returns a number. See Details.
custom_obs: Number corresponding to the observed score for comparing the GGMs. This is required if a function is provided in FUN. See Details.
loss: Logical. If a function is provided, is the measure a "loss function" (i.e., a large score is bad thing). This determines how the p-value is computed. See Details.
progress: Logical. Should a progress bar be included (defaults to TRUE) ?

Details

The FUN argument allows for a user defined test-statisic (the measure used to compare the GGMs). The function must include only two agruments, each of which corresponds to a dataset. For example, f <- function(Yg1, Yg2), where each Y is dataset of dimensions n by p. The groups are then compare within the function, returning a single number. An example is provided below.

Further, when using a custom function care must be taken when specifying the argument loss. We recommended to visualize the results with plot to ensure the p-value was computed in the right direction.

References

Examples

Run this code


# \donttest{
# note: iter = 250 for demonstrative purposes

# data
Y <- bfi

#############################
######### global ############
#############################


# males
Ym <- subset(Y, gender == 1,
             select = - c(gender, education))

# females

Yf <- subset(Y, gender == 2,
             select = - c(gender, education))


global_test <- ggm_compare_ppc(Ym, Yf,
                               iter = 250)

global_test


#############################
###### custom function ######
#############################
# example 1

# maximum difference van Borkulo et al. (2017)

f <- function(Yg1, Yg2){

# remove NA
x <- na.omit(Yg1)
y <- na.omit(Yg2)

# nodes
p <- ncol(Yg1)

# identity matrix
I_p <- diag(p)

# partial correlations

pcor_1 <- -(cov2cor(solve(cor(x))) - I_p)
pcor_2 <- -(cov2cor(solve(cor(y))) - I_p)

# max difference
max(abs((pcor_1[upper.tri(I_p)] - pcor_2[upper.tri(I_p)])))

}

# observed difference
obs <- f(Ym, Yf)

global_max <- ggm_compare_ppc(Ym, Yf,
                              iter = 250,
                              FUN = f,
                              custom_obs = obs,
                              progress = FALSE)

global_max


# example 2
# Hamming distance (squared error for adjacency)

f <- function(Yg1, Yg2){

# remove NA
x <- na.omit(Yg1)
y <- na.omit(Yg2)

# nodes
p <- ncol(x)

# identity matrix
I_p <- diag(p)

fit1 <-  estimate(x, analytic = TRUE)
fit2 <-  estimate(y, analytic = TRUE)

sel1 <- select(fit1)
sel2 <- select(fit2)

sum((sel1$adj[upper.tri(I_p)] - sel2$adj[upper.tri(I_p)])^2)

}

# observed difference
obs <- f(Ym, Yf)

global_hd <- ggm_compare_ppc(Ym, Yf,
                            iter = 250,
                            FUN = f,
                            custom_obs  = obs,
                            progress = FALSE)

global_hd


#############################
########  nodewise ##########
#############################

nodewise <- ggm_compare_ppc(Ym, Yf, iter = 250,
                           test = "nodewise")

nodewise

# }

Run the code above in your browser using DataLab