This may be used to summarize either new, unobserved instances of
\(\textbf{y}\) (called \(\textbf{y}^{new}\)) or
replicates of \(\textbf{y}\) (called
\(\textbf{y}^{rep}\)). Either \(\textbf{y}^{new}\) or
\(\textbf{y}^{rep}\) is summarized, depending on
predict.vb
.
# S3 method for vb.ppc
summary(object, Categorical, Rows, Discrep,
d, Quiet, …)
An object of class vb.ppc
is required.
Logical. If TRUE
, then y
and
yhat
are considered to be categorical (such as y=0 or y=1),
rather than continuous.
An optional vector of row numbers, for example
c(1:10)
. All rows will be estimated, but only these rows will
appear in the summary.
A character string indicating a discrepancy
test. Discrep
defaults to NULL
. Valid character
strings when y
is continuous are: "Chi-Square"
,
"Chi-Square2"
, "Kurtosis"
, "L-criterion"
,
"MASE"
, "MSE"
, "PPL"
, "Quadratic Loss"
,
"Quadratic Utility"
, "RMSE"
, "Skewness"
,
"max(yhat[i,]) > max(y)"
, "mean(yhat[i,]) > mean(y)"
,
"mean(yhat[i,] > d)"
, "mean(yhat[i,] > mean(y))"
,
"min(yhat[i,]) < min(y)"
, "round(yhat[i,]) = d"
, and
"sd(yhat[i,]) > sd(y)"
. Valid character strings when y
is categorical are: "p(yhat[i,] != y[i])"
. Kurtosis and
skewness are not discrepancies, but are included here for convenience.
This is an optional integer to be used with the
Discrep
argument above, and it defaults to d=0
.
This logical argument defaults to FALSE
and will
print results to the console. When TRUE
, results are not
printed.
Additional arguments are unused.
This function returns a list with the following components:
The Bayesian Predictive Information Criterion (BPIC) was introduced by Ando (2007). BPIC is a variation of the Deviance Information Criterion (DIC) that has been modified for predictive distributions. For more information on DIC (Spiegelhalter et al., 2002), see the accompanying vignette entitled "Bayesian Inference". \(BPIC = Dbar + 2pD\). The goal is to minimize BPIC.
This is the percentage of the records of y that are
within the 95% quantile-based probability interval (see
p.interval
) of \(\textbf{y}^{rep}\).
Gelfand's suggested goal is to achieve 95% predictive concordance.
Lower percentages indicate too many outliers and a poor fit of the
model to the data, and higher percentages may suggest overfitting.
Concordance occurs only when \(\textbf{y}\) is continuous.
This is the mean of the record-level lifts, and
occurs only when \(\textbf{y}\) is specified as categorical
with Categorical=TRUE
.
This is only reported if the
Discrep
argument receives a valid discrepancy measure as
listed above. The Discrep
applies to each record of
\(\textbf{y}\), and the Discrepancy.Statistic
reports
the results of the discrepancy measure on the entire data set. For
example, if Discrep="min(yhat[i,]) < min(y)"
, then the
overall result is the proportion of records in which the minimum
sample of yhat was less than the overall minimum
\(\textbf{y}\). This is Pr(min(yhat[i,]) < min(y) | y,
Theta)
, where Theta
is the parameter set.
The L-criterion (Laud and Ibrahim, 1995) was
developed for model and variable selection. It is a sum of two
components: one involves the predictive variance and the other
includes the accuracy of the means of the predictive
distribution. The L-criterion measures model performance with a
combination of how close its predictions are to the observed data
and variability of the predictions. Better models have smaller
values of L
. L
is measured in the same units as
the response variable, and measures how close the data vector
\(\textbf{y}\) is to the predictive distribution. In addition
to the value of L
, there is a value for S.L
, which is
the calibration number of L
, and is useful in determining how
much of a decrease is necessary between models to be noteworthy.
This is a \(N \times 5\) matrix, where \(N\) is the number of monitored variables and there are 5 columns, as follows: Mean, SD, LB (the 2.5% quantile), Median, and UB (the 97.5% quantile).
When \(\textbf{y}\) is continuous, this is a
\(N \times 8\) matrix, where \(N\) is the number of
records of \(\textbf{y}\) and there are 8 columns, as follows:
y, Mean, SD, LB (the 2.5% quantile), Median, UB (the 97.5%
quantile), PQ (the predictive quantile, which is
\(Pr(\textbf{y}^{rep} \ge \textbf{y})\)), and
Test, which shows the record-level result of a test, if
specified. When \(\textbf{y}\) is categorical, this matrix has
a number of columns equal to the number of categories of
\(\textbf{y}\) plus 3, also including y
, Lift
,
and Discrep
.
This function summarizes an object of class vb.ppc
, which
consists of posterior predictive checks on either
\(\textbf{y}^{new}\) or \(\textbf{y}^{rep}\),
depending respectively on whether unobserved instances of
\(\textbf{y}\) or the model sample of \(\textbf{y}\) was
used in the predict.vb
function. The deviance and
monitored variables are also summarized.
The purpose of a posterior predictive check is to assess how well (or poorly) the model fits the data, or to assess discrepancies between the model and the data. For more information on posterior predictive checks, see https://web.archive.org/web/20150215050702/http://www.bayesian-inference.com/posteriorpredictivechecks.
When \(\textbf{y}\) is continuous and known, this function estimates the predictive concordance between \(\textbf{y}\) and \(\textbf{y}^{rep}\) as per Gelfand (1996), and the predictive quantile (PQ), which is for record-level outlier detection used to calculate Gelfand's predictive concordance.
When \(\textbf{y}\) is categorical and known, this function
estimates the record-level lift, which is
p(yhat[i,] = y[i]) / [p(y = j) / n]
, or
the number of correctly predicted samples over the rate of that
category of \(\textbf{y}\) in vector \(\textbf{y}\).
A discrepancy measure is an approach to studying discrepancies between the model and data (Gelman et al., 1996). Below is a list of discrepancy measures, followed by a brief introduction to discrepancy analysis:
The "Chi-Square"
discrepancy measure is the chi-square
goodness-of-fit test that is recommended by Gelman. For each record
i=1:N, this returns (y[i] - E(y[i]))^2 / var(yhat[i,]).
The "Chi-Square2"
discrepancy measure returns the
following for each record: Pr(chisq.rep[i,] > chisq.obs[i,]), where
chisq.obs[i,] <- (y[i] - E(y[i]))^2 / E(y[i]), and chisq.rep[i,] <-
(yhat[i,] - E(yhat[i,]))^2 / E(yhat[i,]), and the overall
discrepancy is the percent of records that were outside of the 95%
quantile-based probability interval (see p.interval
).
The "Kurtosis"
discrepancy measure returns the kurtosis
of \(\textbf{y}^{rep}\) for each record, and the discrepancy
statistic is the mean for all records. This does not measure
discrepancies between the model and data, and is useful for finding
kurtotic replicate distributions.
The "L-criterion"
discrepancy measure of Laud and Ibrahim
(1995) provides the record-level combination of two components (see
below), and the discrepancy statistic is the sum, L
, as well as
a calibration number, S.L
. For more information on the
L-criterion, see the accompanying vignette entitled "Bayesian
Inference".
The "MASE"
(Mean Absolute Scaled Error) is a
discrepancy measure for the accuracy of time-series forecasts,
estimated as (|y - yhat|) / mean(abs(diff(y)))
. The discrepancy
statistic is the mean of the record-level values.
The "MSE"
(Mean Squared Error) discrepancy measure
provides the MSE for each record across all replicates, and the
discrepancy statistic is the mean of the record-level MSEs. MSE and
quadratic loss are identical.
The "PPL"
(Posterior Predictive Loss) discrepancy
measure of Gelfand and Ghosh (1998) provides the record-level
combination of two components: one involves the predictive variance
and the other includes the accuracy of the means of the predictive
distribution. The d=0
argument applies the following weight to
the accuracy component, which is then added to the variance component:
\(d/(d+1)\). For \(\textbf{y}^{new}\), use \(d=0\). For
\(\textbf{y}^{rep}\) and model comparison, \(d\) is
commonly set to 1, 10, or 100000. Larger values of \(d\) put more
stress on fit and downgrade the precision of the estimates.
The "Quadratic Loss"
discrepancy measure provides the
mean quadratic loss for each record across all replicates, and the
discrepancy statistic is the mean of the record-level mean quadratic
losses. Quadratic loss and MSE are identical, and quadratic loss is
the negative of quadratic utility.
The "Quadratic Utility"
discrepancy measure provides
the mean quadratic utility for each record across all replicates, and
the discrepancy statistic is the mean of the record-level mean
quadratic utilities. Quadratic utility is the negative of quadratic
loss.
The "RMSE"
(Root Mean Squared Error) discrepancy
measure provides the RMSE for each record across all replicates, and
the discrepancy statistic is the mean of the record-level RMSEs.
The "Skewness"
discrepancy measure returns the skewness
of \(\textbf{y}^{rep}\) for each record, and the discrepancy
statistic is the mean for all records. This does not measure
discrepancies between the model and data, and is useful for finding
skewed replicate distributions.
The "max(yhat[i,]) > max(y)"
discrepancy measure
returns a record-level indicator when a record's maximum
\(\textbf{y}^{rep}_i\) exceeds the maximum of
\(\textbf{y}\). The discrepancy statistic is the mean of the
record-level indicators, reporting the proportion of records with
replications that exceed the maximum of \(\textbf{y}\).
The "mean(yhat[i,]) > mean(y)"
discrepancy measure
returns a record-level indicator when the mean of a record's
\(\textbf{y}^{rep}_i\) is greater than the mean of
\(\textbf{y}\). The discrepancy statistic is the mean of the
record-level indicators, reporting the proportion of records with
mean replications that exceed the mean of \(\textbf{y}\).
The "mean(yhat[i,] > d)"
discrepancy measure returns a
record-level proportion of \(\textbf{y}^{rep}_i\) that
exceeds a specified value, d
. The discrepancy statistic is the
mean of the record-level proportions.
The "mean(yhat[i,] > mean(y))"
discrepancy measure
returns a record-level proportion of
\(\textbf{y}^{rep}_i\) that exceeds the mean of
\(\textbf{y}\). The discrepancy statistic is the mean of the
record-level proportions.
The "min(yhat[i,]) < min(y)"
discrepancy measure
returns a record-level indicator when a record's minimum
\(\textbf{y}^{rep}_i\) is less than the minimum of
\(\textbf{y}\). The discrepancy statistic is the mean of the
record-level indicators, reporting the proportion of records with
replications less than the minimum of \(\textbf{y}\).
The "round(yhat[i,]) = d"
discrepancy measure returns a
record-level proportion of \(\textbf{y}^{rep}_i\) that,
when rounded, is equal to a specified discrete value, d
. The
discrepancy statistic is the mean of the record-level proportions.
The "sd(yhat[i,]) > sd(y)"
discrepancy measure returns a
record-level indicator when the standard deviation of replicates is
larger than the standard deviation of all of \(\textbf{y}\). The
discrepancy statistic is the mean of the record-level indicators,
reporting the proportion of records with larger standard deviations
than \(\textbf{y}\).
The "p(yhat[i,] != y[i])"
discrepancy measure returns
the record-level probability that \(\textbf{y}^{rep}_i\)
is not equal to \(\textbf{y}\). This is valid when
\(\textbf{y}\) is categorical and yhat
is the predicted
category. The probability is the proportion of replicates.
After observing a discrepancy statistic, the user attempts to improve the model by revising the model to account for discrepancies between data and the current model. This approach to model revision relies on an analysis of the discrepancy statistic. Given a discrepancy measure that is based on model fit, such as the L-criterion, the user may correlate the record-level discrepancy statistics with the dependent variable, independent variables, and interactions of independent variables. The discrepancy statistic should not correlate with the dependent and independent variables. Interaction variables may be useful for exploring new relationships that are not in the current model. Alternatively, a decision tree may be applied to the record-level discrepancy statistics, given the independent variables, in an effort to find relationships in the data that may be helpful in the model. Model revision may involve the addition of a finite mixture component to account for outliers in discrepancy, or specifying the model with a distribution that is more robust to outliers. There are too many suggestions to include here, and discrepancy analysis varies by model.
Ando, T. (2007). "Bayesian Predictive Information Criterion for the Evaluation of Hierarchical Bayesian and Empirical Bayes Models". Biometrika, 94(2), p. 443--458.
Gelfand, A. (1996). "Model Determination Using Sampling Based Methods". In Gilks, W., Richardson, S., Spiegehalter, D., Chapter 9 in Markov Chain Monte Carlo in Practice. Chapman and Hall: Boca Raton, FL.
Gelfand, A. and Ghosh, S. (1998). "Model Choice: A Minimum Posterior Predictive Loss Approach". Biometrika, 85, p. 1--11.
Gelman, A., Meng, X.L., and Stern H. (1996). "Posterior Predictive Assessment of Model Fitness via Realized Discrepancies". Statistica Sinica, 6, p. 733--807.
Laud, P.W. and Ibrahim, J.G. (1995). "Predictive Model Selection". Journal of the Royal Statistical Society, B 57, p. 247--262.
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., and van der Linde, A. (2002). "Bayesian Measures of Model Complexity and Fit (with Discussion)". Journal of the Royal Statistical Society, B 64, p. 583--639.
predict.vb
,
p.interval
, and
VariationalBayes
.
# NOT RUN {
### See the VariationalBayes function for an example.
# }
Run the code above in your browser using DataLab