ht
carries out hypothesis tests for models defined implicitly by a random simulator. It takes as input estimates of the log likelihood obtained via simulations of the model. Tests are carried out using a simulation meta model. See Park (2025) for more details on the method.
# S3 method for simll
ht(
simll,
null.value,
test = c("parameter", "MESLE", "moments"),
case = NULL,
type = NULL,
weights = NULL,
autoAdjust = FALSE,
K1_est_method = "batch",
batch_size = NULL,
max_lag = NULL,
plot_acf = FALSE,
MCcorrection = "none",
...
)
A list consisting of the following components are returned.
regression_estimates: point estimates for the meta model parameters, a, b, c, and sigma^2. Given only when test="MESLE" or "parameter".
meta_model_MLE_for_*: point estimate for the tested quantity under a normal meta model
Hypothesis_Tests: a data frame of the null values and the corresponding p-values. When test
="moments" and type
="regression", each null value is given in the form of c(a,b,c,sigma^2) where a, b, c, sigma^2 are first, second, third, and fourth entries of the given null value.
pvalue_numerical_error_size: When test
="moments", approximate size of error in numerical evaluation of p-values (automatically set to approximately 0.01 or 0.001). For these case, p-values are found using the SCL distributions, whose cumulative distribution functions are numerically evaluated using random number generations. Thus p-values have some stochastic error. The size of the numerical error is automatically set to approximately 0.01, but if any of the p-values found is less than 0.01, more computations are carried out to reduce the numerical error size to approximately 0.001. Note that when test
="MESLE" or "parameter", the (standard) F distribution is used, so this list component is omitted.
max_lag: if test
="parameter" and case
="stationary", the maximum lag for computing the autocovariance in estimating K1 is shown.
pval_cubic: The p-value of the test about whether the cubic term in the cubic polynomial regression is significant. If pval_cubic
is small, the result of the ht
function may be biased. The test on the cubic term is carried out only when the number of simulated log likelihoods is greater than \((d+1)*(d+2)*(d+3)/6\) where \(d\) is the dimension of the parameter vector. When autoAdjust
is TRUE, pval_cubic
is computed with the adjusted weights.
updated_weights: When autoAdjust
is TRUE, the modified weights for the simulation points are returned.
A class simll
object, containing simulated log likelihoods, the parameter values at which simulations are made (may be omitted if all simulations are made at the same parameter value), and the weights for those simulations for regression (optional). See help(simll).
The null value(s) for the hypothesis test. The expected format depends on which teset will be carried out. See the Details section for more information.
A character string indicating which is to be tested about. One of "moments", "MESLE", or "parameter". See Details.
When test
is "parameter", case
needs to be either "iid" or "stationary". case
= "iid" means that the observations are iid, and case
= "stationary" means that the observations form a stationary sequence. The case
argument affects how the variance of the slope of the mean function (=K_1 in Park (2025)) is estimated. The default value is "stationary".
When test
is "moments", the type
argument needs to be specified. type
= "point" means that the test about the mean and the variance of simulated log likelihoods at a given parameter point is considered. type
= "regression" means that the test about the mean function and the variance of simulated log likelihoods at various parameter values is considered. See Details.
An optional argument. The un-normalized weights of the simulated log likelihoods for regression. A numeric vector of length equal to the params
attribute of the simll
object. See Details below.
logical. If TRUE, simulation points at which the third order term is statistically significant in the cubic approximation to the simulated log-likelihooods have discounted weights for metamodel fitting. The weights of the points relatively far from the estimated MESLE are more heavily discounted. These weight discount factors are multiplied to the originally given weights for parameter estimation. If autoAdjust
is FALSE, the weight discount step is skipped. Defaults to FALSE. See ?optDesign and Park (2025) for more details.
Either "batch" or "autocov". Used when test
is "parameter" and case
is "stationary". The default is "batch". See Details for more information.
Numeric. The size of the batch when K1_est_method
is "batch". If not supplied, the default value is round(n^0.4)
where n
is the number of observations in the data.
When test
is "parameter" and case
is "stationary", the value of max_lag
gives the truncation point for lagged autocovariance when estimating K1 as a sum of lagged autocovariances of estimates slopes. If not supplied, default is the maximum lag for which at least one of the entries of the matrix of lagged autocorrelation has absolute value greater than 4/sqrt(nobs), where the lagged autocorrelation is found up to lag 10*log10(nobs/d)
. Here nobs
is the number of observations and d
is the dimension of the parameter space.
Logical. Should the autocorrelation plot be generated when estimating K1 for the case where test
is "parameter" and case
is "stationary"?
For tests on the simulation based parameter surrogate (test
="parameter"), MCcorrection
determines if and how the sampling distribution of the test statistic will be corrected by a Monte Carlo method to account for the variability in the estimate of K1. Possible values are "none" (default) and "Wishart". See the Details section and Park (2025) for more details.
Other optional arguments, not currently used.
This is a generic function, taking a class simll
object as the first argument.
Hypothesis tests are carried out under a normal metamodel--that is, the simulated log likelihoods (whose values are given in the simll
object) are normally distributed.
If test
= "moments", the type
argument needs to be either "point" or "regression".
If type
= "point", a test about the mean and the variance of the simulated log likelihood at a single parameter value is conducted.
If type
= "regression", the simll
object should contain simulated log likelihoods obtained at more than one parameter values, specified by the params
attribute of the simll
object. A (weighted) quadratic regression for the simulated log likelihoods will be used for hypothesis tests, where the x-axis values are given by the params
values of the simll
object and the y-axis values are the corresponding simulated log likelihoods.
The test is about the quadruple \(a, b, c, sigma^2\) where \(a, b, c\) are coefficients of the polynomial describing the mean of the simulated log likelihood (i.e., \(l(\theta) = a + b \theta + c \theta^2\)) and \(\sigma^2\) is the variance of the simulated log likelihood.
If test
= "moments" and type
is not specified, type
defaults to "point" if the params
attribute of the simll
object is not supplied or has length one, and defaults to "regression" otherwise.
When test
= "MESLE" or "parameter", the simll
object should have the params
attribute.
If test
= "MESLE", the test is about the location of the maximum expected simulated log likelihood estimate.
If test
= "parameter", inference on the simulation based surrogate will be carried out under the local asymptotic normality for simulated log likelihood (see Park (2025) for more information.)
The default value for test
is "parameter".
When quadratic regression is carried out, the weights for the simulation based likelihood estimates can be specified. The length of weights
should be equal to that of the params
attribute of the simll
, which is equal to the number of rows in the simulated log likelihood matrix in the simll
object. It is important to note that the weights are not supposed to be normalized (i.e., sum to one). Multiplying all weights by the same constant changes the estimation outputs. If not supplied, the weights
attribute of the simll
object is used. If neither is supplied, weights
defaults to the vector of all ones.
When test
is "moments" and type
is "point", null.value
is either a vector of length two (one entry for the mean and the other for the variance of the simulated log likelihoods), a matrix of two columns (one for the mean and the other for the variance), or a list of vectors of length two (each entry of the list gives a null value consisting of the mean and the variance.)
When test
is "moments" and type
is "regression", null.value
can be a list of length four, or a list of lists of length four. The first case corresponds to when a single null hypothesis is tested. The four components are a) the constant term in the quadratic mean function (scalar), b) the linear coefficient term in the mean function (vector of length \(d\) where \(d\) is the dimension of the parameter vector), c) the quadratic coefficient term in the mean function (symmetric matrix of dimension \(d \times d\)), and d) the variance of the simulated log likelihood (scalar). The second case is when more than one null values are tested. In this case each component of the list is a list having four entries as described for the case of a single null value.
When test
is "MESLE" or "parameter", null.value
is a vector of length \(d\) (a single null value), a matrix having \(d\) columns (each row giving a vector for a null value), or a list of vectors of length \(d\) (more than one null values).
Park, J. (2025). Scalable simulation-based inference for implicitly defined models using a metamodel for Monte Carlo log-likelihood estimator tools:::Rd_expr_doi("10.48550/arxiv.2311.09446")
# State process: X_i ~ N(theta0, tau^2), Observation process: Y_i ~ N(X_i, 1)
theta0 <- 0 # true parameter
n <- 200 # number of observations
xhidden <- rnorm(n, theta0, 30) # hidden x values
ydata <- rnorm(n, xhidden, 1) # observed y values
theta_sim <- runif(300, -10, 10) # simulation points
ll <- sapply(theta_sim, function(t) { # simulation-based log-likelihood estimates (except constant)
x <- rnorm(n, t, 30)
-(x-ydata)^2/2
})
plot(theta_sim, apply(ll, 2, sum)) # display the log-likelihood estimates
s <- simll(ll, params=theta_sim) # create a `simll` object
ht(s, null.value=list(-1,0,1), test="parameter", case="iid")
Run the code above in your browser using DataLab