Given a (generalized) linear model, the (pseudo) Score statistic tests for the existence of one breakpoint.
pscore.test(obj, seg.Z, k = 10, alternative = c("two.sided", "less", "greater"),
values=NULL, dispersion=NULL, df.t=NULL, more.break=FALSE, n.break=1,
only.term=FALSE, break.type=c("break","jump"))
A list with class 'htest
' containing the following components:
title (character)
the regression model and the segmented variable being tested
the empirical value of the statistic
number of evaluation points
the p-value
the alternative hypothesis set
a fitted model typically returned by glm
or lm
. Even an object returned by
segmented
can be set. Offset and weights are allowed.
a formula with no response variable, such as seg.Z=~x1
, indicating the (continuous) segmented variable being tested. Only a single variable may be tested and an error is printed when seg.Z
includes two or more terms. seg.Z
can be omitted if i)obj
is a segmented fit with a single segmented covariate (and that variable is taken), or ii)if it is a "lm" or "glm" fit with a single covariate (and that variable is taken).
optional. Number of points (equi-spaced from the min to max) used to compute the pseudo Score statistic. See Details.
a character string specifying the alternative hypothesis (relevant to the slope difference parameter).
optional. The evaluation points where the Score test is computed. See Details for default values.
optional. the dispersion parameter for the family to be used to compute the test statistic.
When NULL
(the default), it is inferred from obj
. Namely it is taken as 1
for the
Binomial and Poisson families, and otherwise estimated by the residual Chi-squared statistic in the model obj
(calculated from cases with non-zero weights divided by the residual degrees of freedom).
optional. The degress-of-freedom used to compute the p-value. When NULL
, the df extracted from obj
are used.
optional, logical. If obj
is a 'segmented' fit, more.break=FALSE
tests for the actual breakpoint for the variable 'seg.Z',
while more.break=TRUE
tests for an additional breakpoint(s) for the variable 'seg.Z'. Ignored when obj
is not a segmented fit.
optional. Number of breakpoints postuled under the alternative hypothesis.
logical. If TRUE
, only the pseudo covariate(s) relevant to the testing for the breakpoint is returned, and no test is computed.
The kind of breakpoint being tested. "break"
is for piecewise-linear relationships, "jump"
means piecewise-constant, i.e. a step-function, relationships.
Vito M.R. Muggeo
pscore.test
tests for a non-zero difference-in-slope parameter of a segmented
relationship. Namely, the null hypothesis is \(H_0:\beta=0\), where \(\beta\) is the difference-in-slopes,
i.e. the coefficient of the segmented function \(\beta(x-\psi)_+\). The hypothesis of interest
\(\beta=0\) means no breakpoint. Simulation studies have shown that such Score test is more powerful than the Davies test (see reference) when the alternative hypothesis is `one changepoint'. If there are two or more breakpoints (for instance, a sinusoidal-like relationships), pscore.test
can have lower power, and davies.test
can perform better.
The dispersion
value, if unspecified, is taken from obj
. If obj
represents the fit under the null hypothesis (no changepoint), the dispersion parameter estimate will be usually larger, leading to a (potentially severe) loss of power.
The k
evaluation points are k
equally spaced values in the range of the segmented covariate. k
should not be small.
Specific values can be set via values
, although I have found no important difference due to number and location of the evaluation points, thus default is k=10
equally-spaced points. However, when the possible breakpoint is believed to lie into a specified narrower range, the user can specify k
values in that range leading to higher power in detecting it, i.e. typically lower p-value.
If obj
is a (segmented) lm object, the returned p-value comes from the t-distribution with appropriate degrees of freedom. Otherwise, namely if obj
is a (segmented) glm object, the p-value is computed wrt the Normal distribution.
Muggeo, V.M.R. (2016) Testing with a nuisance parameter present only under the alternative: a score-based approach with application to segmented modelling. J of Statistical Computation and Simulation, 86, 3059--3067.
See also davies.test
.
if (FALSE) {
set.seed(20)
z<-runif(100)
x<-rnorm(100,2)
y<-2+10*pmax(z-.5,0)+rnorm(100,0,3)
o<-lm(y~z+x)
#testing for one changepoint
#use the simple null fit
pscore.test(o,~z) #compare with davies.test(o,~z)..
#use the segmented fit
os<-segmented(o, ~z)
pscore.test(os,~z) #smaller p-value, as it uses the dispersion under the alternative (from 'os')
#test for the 2nd breakpoint in the variable z
pscore.test(os,~z, more.break=TRUE)
}
Run the code above in your browser using DataLab