ebnm_group: Solve the EBNM problem for grouped data

Description

Solves the empirical Bayes normal means (EBNM) problem for observations belonging to distinct groups.

Usage

ebnm_group(
  x,
  s = 1,
  group,
  prior_family = "point_normal",
  mode = 0,
  scale = "estimate",
  g_init = NULL,
  fix_g = FALSE,
  output = ebnm_output_default(),
  ...
)

Value

An ebnm object. Depending on the argument to output, the object is a list containing elements:

data: A data frame containing the observations x and standard errors s.
posterior: A data frame of summary results (posterior means, standard deviations, second moments, and local false sign rates).
fitted_g: The fitted prior $\hat{g}$ (an object of class normalmix, laplacemix, gammamix, unimix, tnormalmix, or horseshoe).
log_likelihood: The optimal log likelihood attained, $L(\hat{g})$.
posterior_sampler: A function that can be used to produce samples from the posterior. For all prior families other than the horseshoe, the sampler takes a single parameter nsamp, the number of posterior samples to return per observation. Since ebnm_horseshoe returns an MCMC sampler, it additionally takes parameter burn, the number of burn-in samples to discard.

S3 methods coef, confint, fitted, logLik,

nobs, plot, predict, print, quantile,

residuals, simulate, summary, and vcov

have been implemented for ebnm objects. For details, see the respective help pages, linked below under See Also.

Arguments

x: A vector of observations. Missing observations (NAs) are not allowed.
s: A vector of standard errors (or a scalar if all are equal). Standard errors may not be exactly zero, and missing standard errors are not allowed. Two prior families have additional restrictions: when horseshoe priors are used, errors must be homoskedastic; and since function deconv in package deconvolveR takes $z$-scores, the "deconvolver" family requires that all standard errors be equal to 1.
group: A vector of character strings that gives the group to which each observation belongs. It must have the same length as argument x. For an example of usage, see Examples below.
prior_family: A named vector that specifies the prior family $G$ for each group. If the same prior family is to be used for all groups, then a character string may be used instead.
mode: A named list that specifies, for each group, the mode of the respective prior $g$, or "estimate" if the mode is to be estimated from the data. If the mode is the same across groups, then a scalar may be used instead. If all modes are to be estimated, then mode = "estimate" may be used.
scale: A named list that specifies, for each group, the scale parameter(s) of the respective prior, or "estimate" if the scale parameters are to be estimated from the data. If the scale parameter is the same across groups, then a scalar may be used instead. If all scales are to be estimated, then scale = "estimate" may be used.
g_init: The prior distributions $g$. Usually this is left unspecified (NULL) and estimated from the data. However, it can be used in conjuction with fix_g = TRUE to fix the prior (useful, for example, to do computations with the "true" $g$ in simulations). If g_init is specified but fix_g = FALSE, g_init specifies the initial value of $g$ used during optimization. If g_init is supplied, it should be a named list that specifies, for each group, a prior of the appropriate class (normalmix for normal, point-normal, scale mixture of normals, and deconvolveR prior families, as well as for the NPMLE; class laplacemix for point-Laplace families; class gammamix for point-exponential families; class horseshoe for horseshoe families; and class unimix for unimodal_ families).
fix_g: If TRUE, fix the prior $g$ at g_init instead of estimating it.
output: A character vector indicating which values are to be returned. Function ebnm_output_default() provides the default return values, while ebnm_output_all() lists all possible return values. See Value below.
...: Additional parameters. When a unimodal_ prior family is used, these parameters are passed to function ash in package ashr. Although it does not call into ashr, the scale mixture of normals family accepts parameter gridmult for purposes of comparison. When gridmult is set, an ashr-style grid will be used instead of the default ebnm grid. When the "deconvolver" family is used, additional parameters are passed to function deconv in package deconvolveR. Families of generalized binary priors take several additional parameters; see ebnm_generalized_binary. In all other cases, additional parameters are ignored.

Details

The EBNM model for grouped data, with observations $x_j$ belonging to groups $k = 1, ..., K$, is $$x_j | \theta_j, s_j \sim N(\theta_j, s_j^2)$$ $$\theta_j \sim g_{k(j)} \in G_{k(j)}.$$

Solving the EBNM problem for grouped data is equivalent to solving a separate EBNM problem for each group $k = 1, ..., K$, with the optimal log likelihood equal to the sum of the optimal log likelihoods for each separate problem.

Examples

Run this code

group <- c(rep("small_sd", 100), rep("large_sd", 100))
theta <- c(rnorm(100, sd = 1), rnorm(100, sd = 10))
s <- 1
x <- theta + rnorm(200, 0, s)

ebnm.group.res <- ebnm_group(x, s, group)

# Use different prior families for each group:
ebnm.group.res <- ebnm_group(
  x, s, group,
  prior_family = list(small_sd = "normal", large_sd = "normal_scale_mixture")
)

# Different modes and scales can be set similarly:
ebnm.group.res <- ebnm_group(
  x, s, group,
  mode = list(small_sd = 0, large_sd = "estimate"),
  scale = list(small_sd = 1, large_sd = "estimate")
)

Run the code above in your browser using DataLab