ebnm_generalized_binary: Solve the EBNM problem using generalized binary priors

Description

Solves the empirical Bayes normal means (EBNM) problem using the family of nonnegative distributions consisting of mixtures where one component is a point mass at zero and the other is a truncated normal distribution with lower bound zero and nonzero mode. Typically, the mode is positive, with the ratio of the mode to the standard deviation taken to be large, so that posterior estimates are strongly shrunk towards one of two values (zero or the mode of the normal component). Identical to function ebnm with argument prior_family = "generalized_binary". For details, see Liu et al. (2023), cited in References below.

Usage

ebnm_generalized_binary(
  x,
  s = 1,
  mode = "estimate",
  scale = 0.1,
  g_init = NULL,
  fix_g = FALSE,
  output = ebnm_output_default(),
  control = NULL,
  ...
)

Value

An ebnm object. Depending on the argument to output, the object is a list containing elements:

data: A data frame containing the observations x and standard errors s.
posterior: A data frame of summary results (posterior means, standard deviations, second moments, and local false sign rates).
fitted_g: The fitted prior \(\hat{g}\).
log_likelihood: The optimal log likelihood attained, \(L(\hat{g})\).
posterior_sampler: A function that can be used to produce samples from the posterior. The sampler takes a single parameter nsamp, the number of posterior samples to return per observation.

S3 methods coef, confint, fitted, logLik,

nobs, plot, predict, print, quantile,

residuals, simulate, summary, and vcov

have been implemented for ebnm objects. For details, see the respective help pages, linked below under See Also.

Arguments

x

A vector of observations. Missing observations (NAs) are not allowed.

s

A vector of standard errors (or a scalar if all are equal). Standard errors may not be exactly zero, and missing standard errors are not allowed.

mode

A scalar specifying the mode of the truncated normal component, or "estimate" if the mode is to be estimated from the data (the location of the point mass is fixed at zero).

scale

A scalar specifying the ratio of the (untruncated) standard deviation of the normal component to its mode. This ratio must be fixed in advance (i.e., it is not possible to set scale = "estimate" when using generalized binary priors).

g_init

The prior distribution \(g\). Usually this is left unspecified (NULL) and estimated from the data. However, it can be used in conjuction with fix_g = TRUE to fix the prior (useful, for example, to do computations with the "true" \(g\) in simulations). If g_init is specified but fix_g = FALSE, g_init specifies the initial value of \(g\) used during optimization. When supplied, g_init should be an object of class tnormalmix or an ebnm object in which the fitted prior is an object of class tnormalmix.

fix_g

If TRUE, fix the prior \(g\) at g_init instead of estimating it.

output

A character vector indicating which values are to be returned. Function ebnm_output_default() provides the default return values, while ebnm_output_all() lists all possible return values. See Value below.

control

A list of control parameters to be passed to function optim, where method has been set to "L-BFGS-B".

...

The following additional arguments act as control parameters for the outer EM loops in the fitting algorithm. Each loop iteratively updates parameters \(w\) (the mixture proportion corresponding to the truncated normal component) and \(\mu\) (the mode of the truncated normal component):

wlist: A vector defining intervals of \(w\) for which optimal solutions will separately be found. For example, if wlist = c(0, 0.5, 1), then two optimal priors will be found: one such that \(w\) is constrained to be less than 0.5 and one such that it is constrained to be greater than 0.5.

maxiter

A scalar specifying the maximum number of iterations to perform in each outer EM loop.

tol

A scalar specifying the convergence tolerance parameter for each outer EM loop.

mu_init

A scalar specifying the initial value of \(\mu\) to be used in each outer EM loop.

mu_range

A vector of length two specifying lower and upper bounds for possible values of \(\mu\).

References

Yusha Liu, Peter Carbonetto, Jason Willwerscheid, Scott A Oakes, Kay F Macleod, and Matthew Stephens (2023). Dissecting tumor transcriptional heterogeneity from single-cell RNA-seq data by generalized binary covariance decomposition. bioRxiv 2023.08.15.553436.