segmented: Segmented relationships in regression models

Description

Fits regression models with segmented relationships between the response and one or more explanatory variables. Break-point estimates are provided.

Usage

segmented(obj, seg.Z, psi, npsi, control = seg.control(), 
    model = TRUE, ...)
# S3 method for default
segmented(obj, seg.Z, psi, npsi, control = seg.control(),
    model = TRUE, keep.class=FALSE, ...)
# S3 method for lm
segmented(obj, seg.Z, psi, npsi, control = seg.control(),
    model = TRUE, keep.class=FALSE, ...)
# S3 method for glm
segmented(obj, seg.Z, psi, npsi, control = seg.control(),
    model = TRUE, keep.class=FALSE, ...)
# S3 method for Arima
segmented(obj, seg.Z, psi, npsi, control = seg.control(),
    model = TRUE, keep.class=FALSE, ...)

Arguments

obj

standard `linear' model of class "lm" or "glm". Since version 0.5.0-0 any regression fit may be supplied (see 'Details').

seg.Z

the segmented variables(s), i.e. the continuous covariate(s) understood to have a piecewise-linear relationship with response. It is a formula with no response variable, such as seg.Z=~x or seg.Z=~x1+x2. It can be missing when obj (a "lm" or "glm" fit) includes only one covariate which is taken as segmented variable. Currently, formulas involving functions, such as seg.Z=~log(x1) or even seg.Z=~sqrt(x1), or selection operators, such as seg.Z=~d[,"x1"] or seg.Z=~d$x1, are not allowed.

psi

starting values for the breakpoints to be estimated. If there is a single segmented variable specified in seg.Z, psi is a numeric vector, and it can be missing when 1 breakpoint has to be estimated (and the median of the segmented variable is used as a starting value). If seg.Z includes several covariates, psi has be specified as a named list of vectors whose names have to match the variables in the seg.Z argument. Each vector of such list includes starting values for the break-point(s) for the corresponding variable in seg.Z. A NA value means that `K' quantiles (or equally spaced values) are used as starting values; K is fixed via the seg.control auxiliary function.

npsi

A named vector or list meaning the number (and not locations) of breakpoints to be estimated. The starting values will be internally computed via the quantiles or equally spaced values, see argument quant in seg.control. If there is a single term in seg.Z, npsi can be missing and npsi=1 is assumed, but in general one of psi or npsi has to be specified. If psi is provided, npsi is ignored.

control

a list of parameters for controlling the fitting process. See the documentation for seg.control for details.

model

logical value indicating if the model.frame should be returned.

keep.class

logical value indicating if the final fit returned by segmented.default should keep the class 'segmented' (along with the class of the original fit obj). Ignored by the segmented methods.

…

optional arguments.

Value

The returned object depends on the last component returned by seg.control. If last=TRUE, the default, segmented returns an object of class "segmented" which inherits from the class "lm" or "glm" depending on the class of obj. Otherwise a list is returned, where the last component is the fitted model at the final iteration, see seg.control.

An object of class "segmented" is a list containing the components of the original object obj with additionally the followings:

psi

estimated break-points and relevant (approximate) standard errors

number of iterations employed

epsilon

difference in the objective function when the algorithm stops

model

the model frame

psi.history

a list or a vector including the breakpoint estimates at each step

seed

the integer vector containing the seed just before the bootstrap resampling. Returned only if bootstrap restart is employed

Other components are not of direct interest of the user

Warning

It is well-known that the log-likelihood function for the break-point may be not concave, especially for poor clear-cut kink-relationships. In these circumstances the initial guess for the break-point, i.e. the psi argument, must be provided with care. For instance visual inspection of a, possibly smoothed, scatter-plot is usually a good way to obtain some idea on breakpoint location. However bootstrap restarting, implemented since version 0.2-9.0, is relatively more robust to starting values specified in psi. Alternatively an automatic procedure may be implemented by specifying psi=NA and fix.npsi=FALSE in seg.control: experience suggests to increase the number of iterations via it.max in seg.control(). This automatic procedure, however, is expected to overestimate the number of breakpoints.

Details

Given a linear regression model (usually of class "lm" or "glm"), segmented tries to estimate a new model having broken-line relationships with the variables specified in seg.Z. A segmented (or broken-line) relationship is defined by the slope parameters and the break-points where the linear relation changes. The number of breakpoints of each segmented relationship is fixed via the psi argument, where initial values for the break-points must be specified. The model is estimated simultaneously yielding point estimates and relevant approximate standard errors of all the model parameters, including the break-points.

Since version 0.2-9.0 segmented implements the bootstrap restarting algorithm described in Wood (2001). The bootstrap restarting is expected to escape the local optima of the objective function when the segmented relationship is flat and the log likelihood can have multiple local optima.

Since version 0.5-0.0 the default method segmented.default has been added to estimate segmented relationships in general (besides "lm" and "glm" fits) regression models, such as Cox regression or quantile regression (for a single percentile). The objective function to be minimized is the (minus) value extracted by the logLik function or it may be passed on via the fn.obj argument in seg.control. See example below. While the default method is expected to work with any regression fit (where the usual coef(), update(), and logLik() returns appropriate results), it is not recommended for "lm" or "glm" fits (as segmented.default is slower than the specific methods segmented.lm and segmented.glm), although final results are the same. However the object returned by segmented.default is not of class "segmented", as currently the segmented methods are not guaranteed to work for `generic' (i.e., besides "lm" and "glm") regression fits. The user could try each "segmented" method on the returned object by calling it explicitly (e.g. via plot.segmented() or confint.segmented()).

References

Muggeo, V.M.R. (2003) Estimating regression models with unknown break-points. Statistics in Medicine 22, 3055--3071.

Muggeo, V.M.R. (2008) Segmented: an R package to fit regression models with broken-line relationships. R News 8/1, 20--25.

Examples

Run this code

# NOT RUN {
set.seed(12)
xx<-1:100
zz<-runif(100)
yy<-2+1.5*pmax(xx-35,0)-1.5*pmax(xx-70,0)+15*pmax(zz-.5,0)+rnorm(100,0,2)
dati<-data.frame(x=xx,y=yy,z=zz)
out.lm<-lm(y~x,data=dati)

#simple example: 1 segmented variable, 1 breakpoint: you do not need to specify 
# the starting value for psi
o<-segmented(out.lm,seg.Z=~z)

#1 segmented variable, 2 breakpoints: you have to specify starting values (vector) for psi:
o<-segmented(out.lm,seg.Z=~x,psi=c(30,60), control=seg.control(display=FALSE))
slope(o)
#or by specifying just the *number* of breakpoints
#o<-segmented(out.lm,seg.Z=~x, npsi=2, control=seg.control(display=FALSE)) 


#2 segmented variables: starting values requested via a named list
out.lm<-lm(y~z,data=dati)
o1<-update(o,seg.Z=~x+z,psi=list(x=c(30,60),z=.3))
#or by specifying just the *number* of breakpoints
#o1<-update(o,seg.Z=~x+z, npsi=c(x=2,z=1))



#the default method leads to the same results (but it is slower)
#o1<-segmented.default(out.lm,seg.Z=~x+z,psi=list(x=c(30,60),z=.3))
#o1<-segmented.default(out.lm,seg.Z=~x+z,psi=list(x=c(30,60),z=.3), 
#    control=seg.control(fn.obj="sum(x$residuals^2)"))


#automatic procedure to estimate breakpoints in the covariate x (starting from K quantiles)
# Hint: increases number of iterations. Notice: bootstrap restart is not allowed!
o<-segmented.lm(out.lm,seg.Z=~x+z,psi=list(x=NA,z=.3), 
    control=seg.control(fix.npsi=FALSE, n.boot=0, tol=1e-7, it.max = 50, K=5, display=TRUE))

#assess the progress of the breakpoint estimates throughout the iterations
# }
# NOT RUN {
par(mfrow=c(1,2))
draw.history(o, "x")
draw.history(o, "z")
# }
# NOT RUN {
#try to increase the number of iterations and re-assess the 
#convergence diagnostics 

# An example using the Arima method:
n<-50
idt <-1:n #the time index

mu<-50-idt +1.5*pmax(idt-30,0)
set.seed(696)
y<-mu+arima.sim(list(ar=.5),n)*3

o<-arima(y, c(1,0,0), xreg=idt)
os1<-segmented(o, ~idt, control=seg.control(display=TRUE)) 



#An example using the default method:
# Cox regression with a segmented relationship  
# }
# NOT RUN {
library(survival)
data(stanford2)

o<-coxph(Surv(time, status)~age, data=stanford2)
os<-segmented(o, ~age, psi=40) #estimate the breakpoint in the age effect
summary(os) #actually it means summary.coxph(os)
plot(os) #it does not work
plot.segmented(os) #call explicitly plot.segmented() to plot the fitted piecewise line
# }
# NOT RUN {
            
# }

Run the code above in your browser using DataLab