Test on device-events using William DuMouchel's Empirical Bayes Gamma-Poisson Shrinker. From the family of disproportionality analyses (DPA) used to generate signals of disproportionate reporting (SDRs).
gps(df, ...)# S3 method for mds_ts
gps(df, ts_event = c(Count = "nA"), analysis_of = NA, ...)
# S3 method for default
gps(
df,
analysis_of = NA,
eval_period = 1,
null_ratio = 1,
cred_interval = 0.9,
init_prior = c(0.2, 0.02, 2, 4, 1/3),
gamma_lower = 1e-05,
gamma_upper = 20,
quantiles = c(0.05, 0.95),
cont_adj = 0,
...
)
Required input data frame of class mds_ts
or, for generic
usage, any data frame with the following columns:
Unique times of class Date
Cell A count (class numeric
) of the 2x2 table:
device/event of interest.
Cell B count (class numeric
) of the 2x2 table:
device/non-event of interest.
Cell C count (class numeric
) of the 2x2 table:
non-device/event of interest.
Cell D count (class numeric
) of the 2x2 table:
non-device/non-event of interest.
Further arguments passed onto gps
methods
Required if df
is of class mds_ts
. Named string
indicating the variable corresponding to the event count (cell A in the 2x2
contingency table). In most cases, the default is the appropriate setting.
See details for alternative options.
Default: c("Count"="nA")
corresponding to the event count column in
mds_ts
objects. Name is generated from mds_ts
metadata.
Optional string indicating the English description of what
was analyzed. If specified, this will override the name of the
ts_event
string parameter.
Default: NA
indicates no English description for plain df
data frames, or ts_event
English description for df
data frames
of class mds_ts
.
Example: "Count of bone cement leakages"
Required positive integer indicating the number of unique times counting in reverse chronological order to sum over to create the 2x2 contingency table.
Default: 1
considers only the most recent time in df
.
Example: 12
sums over the last 12 time periods to create the 2x2
contingency table.
Numeric value representing the null relative reporting
ratio (RR), used with cred_interval
to establish the signal status.
This null_ratio
is saved in the output as the signal threshold. See
details for more.
Default: 1
indicates a null RR of 1 and tests if the lower bound of
the cred_interval
exceeds 1
.
Numeric value between 0 and 1 representing the width of
the Bayesian posterior credible interval, where the lower bound of the
interval is assessed against the null_ratio
. The interval bounds are
returned as the lcl and ucl. See details for more.
Default: 0.90
indicates a 90% credible interval with bounds at 5% and
95%. The signal test is against the lower 5% bound, effectively creating the
EB05 test.
A numeric vector of length 5 representing the
initialization parameters for the prior gamma mixture distribution in this
order: alpha1, beta1, alpha2, beta2, p
. See details for more.
Default: c(.2, .02, 2, 4, 1/3)
as suggested in openEBGM package
v0.7.0.
Positive mumeric value representing the lower bound for the two alphas and betas of the prior during PORT optimization.
Default: 1e-5
is a value suggested in openEBGM package v0.7.0.
Positive mumeric value representing the upper bound for the two alphas and betas of the prior during PORT optimization.
Default: 20
is a value suggested in openEBGM package v0.7.0.
Vector of quantiles between 0 and 1. gps()
will
return an equal length vector of estimated empirical Bayes quantiles from the
posterior distribution. Specify quantiles=NULL
if no quantiles are
desired.
Default: c(.05, .95)
corresponds to the 5% (EB05) and 95% (EB95)
quantiles.
Positive integer representing the continuity adjustment to be added to each cell of the 2x2 contingency table. A value greater than 0 allows for contingency tables with 0 cells to run the algorithm. Adding a continuity adjustment will adversely affect the algorithm estimates, user discretion is advised. See details for more.
Default: 0
adds zero to each cell, thus an unadjusted table.
A named list of class mdsstat_test
object, as follows:
Name of the test run
English description of what was analyzed
Named boolean of whether the test was run. The name contains the run status.
A standardized list of test run results: statistic
for the test statistic, lcl
and ucl
for the set
confidence bounds, p
for the p-value, signal
status, and
signal_threshold
.
The test parameters
The data on which the test was run
mds_ts
: GPS on mds_ts data
default
: GPS on general data
null_ratio
and cred_interval
are used together to establish the
signal criteria. The null_ratio
is conceptually similar to the
relative reporting ratio under a null hypothesis of no signal. Common values
are 1
and, more conservatively (fewer false signals), 2
. The
cred_interval
is the posterior credibility interval used to test for a
signal. A value of 0.90
returns the 5
tests if the lower bound exceeds null_ratio
. Effectively,
cred_interval=0.90
conducts the well-known EB05 test.
init_prior
specifies the initial guess for the 5 parameters of the
prior gamma mixture distribution as described in DuMouchel (1999, Eqs. 4, 7)
in the sequence: alpha1, beta1, alpha2, beta2, p. gamma_lower
specifies the optimization lower bound for the two alphas and betas.
gamma_upper
specifies similarly the upper bound. The initial guess,
upper and lower bounds are fed into PORT optimization using the
stats::nlminb()
routine.
cont_adj
provides the option to allow gps()
to proceed running,
however this is done at the user's discretion because there are adverse
effects of adding a positive integer to every cell of the contingency table.
By default, gps()
runs with 0 in the C cell only, but not in A, B, or
D. It has been suggested that 0.5 may be an appropriate value. However,
values <1 have been shown to be unstable using box-constrained PORT
optimization, which is the only optimization considered in this release.
Overall, posterior distribution estimates have been shown to be unstable with
very low or 0 count cells.
For parameter ts_event
, in the uncommon case where the
device-event count (Cell A) variable is not "nA"
, the name of the
variable may be specified here. Note that the remaining 3 cells of the 2x2
contingency table (Cells B, C, D) must be the variables "nB"
,
"nC"
, and "nD"
respectively in df
. A named character
vector may be used where the name is the English description of what was
analyzed. Note that if the parameter analysis_of
is specified, it will
override this name. Example: ts_event=c("Count of Bone Cement
Leakages"="event_count")
DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. The American Statistician, 53(3):177-190, August 1999.
Ahmed I, Poncet A. PhViD: PharmacoVigilance Signal Detection, 2016. R package version 1.0.8.
Ihrie J, Canida T. openEBGM: EBGM Scores for Mining Large Contingency Tables, 2018. R package version 0.7.0.
# NOT RUN {
# Basic Example
data <- data.frame(time=c(1:25),
nA=as.integer(stats::rnorm(25, 25, 5)),
nB=as.integer(stats::rnorm(25, 50, 5)),
nC=as.integer(stats::rnorm(25, 100, 25)),
nD=as.integer(stats::rnorm(25, 200, 25)))
a1 <- gps(data)
# Example using an mds_ts object
a2 <- gps(mds_ts[[3]])
# }
Run the code above in your browser using DataLab