pscore.test: Testing for existence of one breakpoint

Description

Given a (generalized) linear model, the (pseudo) Score statistic tests for the existence of one breakpoint.

Usage

pscore.test(obj, seg.Z, k = 10, alternative = c("two.sided", "less", "greater"), 
    values=NULL, dispersion=NULL, df.t=NULL, more.break=FALSE, n.break=1, 
    only.term=FALSE, break.type=c("break","jump"))

Value

A list with class 'htest' containing the following components:

method: title (character)
data.name: the regression model and the segmented variable being tested
statistic: the empirical value of the statistic
parameter: number of evaluation points
p.value: the p-value
process: the alternative hypothesis set

Arguments

obj: a fitted model typically returned by glm or lm. Even an object returned by segmented can be set. Offset and weights are allowed.
seg.Z: a formula with no response variable, such as seg.Z=~x1, indicating the (continuous) segmented variable being tested. Only a single variable may be tested and an error is printed when seg.Z includes two or more terms. seg.Z can be omitted if i)obj is a segmented fit with a single segmented covariate (and that variable is taken), or ii)if it is a "lm" or "glm" fit with a single covariate (and that variable is taken).
k: optional. Number of points (equi-spaced from the min to max) used to compute the pseudo Score statistic. See Details.
alternative: a character string specifying the alternative hypothesis (relevant to the slope difference parameter).
values: optional. The evaluation points where the Score test is computed. See Details for default values.
dispersion: optional. the dispersion parameter for the family to be used to compute the test statistic. When NULL (the default), it is inferred from obj. Namely it is taken as 1 for the Binomial and Poisson families, and otherwise estimated by the residual Chi-squared statistic in the model obj (calculated from cases with non-zero weights divided by the residual degrees of freedom).
df.t: optional. The degress-of-freedom used to compute the p-value. When NULL, the df extracted from obj are used.
more.break: optional, logical. If obj is a 'segmented' fit, more.break=FALSE tests for the actual breakpoint for the variable 'seg.Z', while more.break=TRUE tests for an additional breakpoint(s) for the variable 'seg.Z'. Ignored when obj is not a segmented fit.
n.break: optional. Number of breakpoints postuled under the alternative hypothesis.
only.term: logical. If TRUE, only the pseudo covariate(s) relevant to the testing for the breakpoint is returned, and no test is computed.
break.type: The kind of breakpoint being tested. "break" is for piecewise-linear relationships, "jump" means piecewise-constant, i.e. a step-function, relationships.

Author

Vito M.R. Muggeo

Details

pscore.test tests for a non-zero difference-in-slope parameter of a segmented relationship. Namely, the null hypothesis is \(H_0:\beta=0\), where \(\beta\) is the difference-in-slopes, i.e. the coefficient of the segmented function \(\beta(x-\psi)_+\). The hypothesis of interest \(\beta=0\) means no breakpoint. Simulation studies have shown that such Score test is more powerful than the Davies test (see reference) when the alternative hypothesis is `one changepoint'. If there are two or more breakpoints (for instance, a sinusoidal-like relationships), pscore.test can have lower power, and davies.test can perform better.

The dispersion value, if unspecified, is taken from obj. If obj represents the fit under the null hypothesis (no changepoint), the dispersion parameter estimate will be usually larger, leading to a (potentially severe) loss of power.

The k evaluation points are k equally spaced values in the range of the segmented covariate. k should not be small. Specific values can be set via values, although I have found no important difference due to number and location of the evaluation points, thus default is k=10 equally-spaced points. However, when the possible breakpoint is believed to lie into a specified narrower range, the user can specify k values in that range leading to higher power in detecting it, i.e. typically lower p-value.

If obj is a (segmented) lm object, the returned p-value comes from the t-distribution with appropriate degrees of freedom. Otherwise, namely if obj is a (segmented) glm object, the p-value is computed wrt the Normal distribution.

References

Muggeo, V.M.R. (2016) Testing with a nuisance parameter present only under the alternative: a score-based approach with application to segmented modelling. J of Statistical Computation and Simulation, 86, 3059--3067.

Examples

Run this code

if (FALSE) {
set.seed(20)
z<-runif(100)
x<-rnorm(100,2)
y<-2+10*pmax(z-.5,0)+rnorm(100,0,3)

o<-lm(y~z+x)

#testing for one changepoint
#use the simple null fit
pscore.test(o,~z) #compare with davies.test(o,~z)..

#use the segmented fit
os<-segmented(o, ~z)
pscore.test(os,~z) #smaller p-value, as it uses the dispersion under the alternative (from 'os') 

#test for the 2nd breakpoint in the variable z
pscore.test(os,~z, more.break=TRUE) 

  }

Run the code above in your browser using DataLab