Produces nsim
simulated realizations of the likelihood
ratio statistic, either parametrically or semi-parametrically,
and calculates the corresponding \(p\)-value of the test.
ncMcTest(x, y, data=NULL, ncomp=2, ncincr=1, intercept=TRUE, nsim=99,
seed=NULL, ts1=NULL, ts2=NULL, semiPar=FALSE,
conditional=semiPar, verb=FALSE, progRep=TRUE, …)
Either a formula specifying the regression model to be fitted
or a predictor or matrix of predictors for the regression model.
If x
is a matrix it should NOT include an initial
column of 1
s. If an intercept is desired then the
intercept
argument should be left equal to TRUE
.
The vector of responses for the regression models. Ignored if
x
is a formula.
A list or data frame containing the variables in the regression
model. Such variables will be looked for first in data
and then in the global environment.
The null-hypothesized number of components in the mixture.
The increment from the null-hypothesized number of components in the mixture
to the number under the alternative hypothesis; i.e. the number
of components under the alternative hypothesis is ncomp + ncincr
.
Logical argument indicating whether the regression models in the mixture
should have intercept terms. Ignored if x
is a formula.
The number of simulated replicates of the log likelihood ratio statistic to be produced.
Positive integer scalar. The seed for random number generation.
If left NULL
it is obtained by sampling from 1:1e5
.
Starting values for fitting the ncomp
component model. If ts1 is
null, random starting values are used. (This is not recommended.)
Starting values for fitting the ncomp+nincr
component model. If ts2 is
null, random starting values are used. (This is not recommended.)
Logical scalar; should semi-parametric bootstrapping should be used?
Logical scalar; should the component-selection probabilities be determined conditionally upon the observations?
Logical argument indicating whether the fitting processes should be verbose (i.e. whether details should be printed out at each step of the EM algorithm). If TRUE a huge amount of screen output is produced.
Logical argument indicating whether the index, of the simulated statistic just constructed, should be printed out, to give an idea of how the process is progressing.
Further arguments to be passed to mixreg()
to control the
fitting procedure.
A list with components:
the likelihood ratio statistic for the test
the (Monte Carlo) \(p\)-value of the test
a vector of the values of the likelihood ratio statistics of the simulated data sets
a vector of the aic values for the
ncomp
models fitted to the simulated data sets
a vector of the aic values for the
ncomp+nincr
models fitted to the simulated data sets.
(Note that in any given instance ncomp
and nincr
are
replaced by the actual numeric values that are used in that instance.)
the degrees of freedom that would be appropriate if the
test statistic actually had a chi-squared distribution.
Explicitly df
is equal to the number of parameters in
the alternative model minus the number of parameters in the
null model
a data frame with columns seed
,
i
and type
, containing respectively the random number
generator seed, the index and the type of each screw-up. Note that
if there were in fact no screw-ups then the screwUps
entry
of the returned value will not be present. Note also that it is possible
for values in the i
column of screwUps
to be repeated.
The entries of the type
column take values in the set
{1,2,3,4,5}
. These values have the interpretation:
1: singularity in the likelihood surface for the
ncomp
model
2: the algorithm failed to converge when fitting the
ncomp
component model
3: singularity in the likelihood surface for the
ncomp+nincr
model
4: the algorithm converged for the
ncomp
component model but failed to converge for the
ncomp + nincr
model
5: the likelihood ratio statistic for the ncomp
model was greater than that for the ncomp+nincr
model (which is theoretically impossible)
Note that if a screw-up does occur, the replicate is redone
completely so that the returned value contains results for a
full nsim
simulations.
The returned value has an attribute "seed" which is the (initial) value of the random number generation seed that was used. This is either the value of the argument seed, or, if this was NULL, a randomly generated value.
In this context the “parametric” procedure is to
simulate data sets by generating data from the fitted
ncomp
model parameters, using Gaussian errors. In contrast,
under the
semiparametric bootstrapping procedure, the errors are generated by
resampling from the residuals. Since at each predictor value there
are ncomp
residuals, one for each component of the model,
the errors are selected at random from these ncomp
possibilities.
If the argument conditional
is TRUE
then the selection
probabilities at this step are the conditional probabilities, of
the observation being generated by each component of the model,
given that observation. If conditional
is FALSE
then
these probabilities are the corresponding entries of lambda
(see Value. The residuals are sampled independently in
either case. The procedure is termed semi-parametric since
the sampling probabilities depend on the parameters of the model.
Note that it makes no sense to specify conditional=TRUE
if semiPar
is FALSE
. Doing so will generate an error.
It is important to be aware that the test conducted by this function is a Monte Carlo test and that the \(p\)-value produced is a Monte Carlo \(p\)-value. It is consequently an exact \(p\)-value in a sense which must be carefully understood. See for example Baddeley et al. 2015 (section 10.6) and Turner and Jeffs 2017 for explanation of the interpretation of Monte Carlo \(p\)-values and for some general discussion of Monte Carlo tests and of their advantages. Such tests effect substantial savings on computational costs with only marginal diminishment of power.
T. Rolf Turner (2000). Estimating the rate of spread of a viral infection of potato plants via mixtures of regressions. Applied Statistics 49 Part 3, pp. 371 -- 384.
Adrian Baddeley, Ege Rubak and Rolf Turner (2015). Spatial Point Patterns: Methodology and Applications with R. London: Chapman and Hall/CRC Press.
Rolf Turner and Celeste Jeffs (2017). A note on exact Monte Carlo hypothesis tests. Communications in Statistics: Simulation and Computation 46, pp. 6545 -- 6558.
cband()
, covMix()
,
mixreg()
, plot.cband()
,
plot.mixresid()
, qqMix()
,
residuals.mixreg()
# NOT RUN {
# }
# NOT RUN {
tst12 <- ncMcTest(plntsInf ~ aphRel,ncomp=1,data=aphids,seed=42)
# }
# NOT RUN {
# Monte Carlo p-value is 0.01; mixture model is called for.
TS1 <- list(list(beta=c(3.0,0.1),sigsq=16,lambda=0.5),
list(beta=c(0.0,0.0),sigsq=16,lambda=0.5))
TS2 <- list(list(beta=c(3.0,0.1),sigsq=9,lambda=1/3),
list(beta=c(1.5,0.05),sigsq=9,lambda=1/3),
list(beta=c(0.0,0.0),sigsq=9,lambda=1/3))
x <- aphids$aphRel
y <- aphids$plntsInf
# }
# NOT RUN {
nsim <- 999
# }
# NOT RUN {
# }
# NOT RUN {
tst23 <- ncMcTest(x,y,nsim=nsim,ts1=TS1,ts2=TS2)
# }
Run the code above in your browser using DataLab