Learn R Programming

segmented (version 2.1-2)

segmented: Segmented relationships in regression models

Description

Fits regression models with segmented relationships between the response and one or more explanatory variables. Break-point estimates are provided.

Usage

segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), 
    model = TRUE, ...)

# S3 method for default segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), model = TRUE, keep.class=FALSE, ...)

# S3 method for lm segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), model = TRUE, keep.class=FALSE, ...)

# S3 method for glm segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), model = TRUE, keep.class=FALSE, ...)

# S3 method for Arima segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), model = TRUE, keep.class=FALSE, ...) # S3 method for numeric segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), model = TRUE, keep.class=FALSE, adjX=FALSE, weights=NULL, ...)

Value

segmented returns an object of class "segmented" which inherits from the class of obj, for instance "lm" or "glm".

An object of class "segmented" is a list containing the components of the original object obj with additionally the followings:

psi

estimated break-points (sorted) and relevant (approximate) standard errors

it

number of iterations employed

epsilon

difference in the objective function when the algorithm stops

model

the model frame

psi.history

a list or a vector including the breakpoint estimates at each step

seed

the integer vector containing the seed just before the bootstrap resampling. Returned only if bootstrap restart is employed

..

Other components are not of direct interest of the user

Arguments

obj

standard `linear' model of class "lm", "glm" or "Arima", or potentially any regression fit may be supplied since version 0.5-0 (see 'Details'). obj can include any covariate understood to have a linear (i.e. no break-points) effect on the response. If obj also includes the segmented covariate specified in seg.Z, then all the slopes of the fitted segmented relationship will be estimated. On the other hand, if obj misses the segmented variable, then the 1st (the leftmost) slope is assumed to be zero. Since version 1.5.0, obj can be a simple numeric or ts object but with only a single segmented variable (segmented.numeric) see examples below.

seg.Z

the segmented variable(s), i.e. the continuous covariate(s) understood to have a piecewise-linear relationship with response. It is a formula with no response variable, such as seg.Z=~x or seg.Z=~x1+x2. It can be missing when obj includes only one covariate which is taken as segmented variable. Currently, formulas involving functions, such as seg.Z=~log(x1), or selection operators, such as seg.Z=~d[,"x1"] or seg.Z=~d$x1, are not allowed. Also, variable names formed by U or V only (with or without numbers) are not permitted.

psi

starting values for the breakpoints to be estimated. If there is a single segmented variable specified in seg.Z, psi is a numeric vector, and it can be missing when 1 breakpoint has to be estimated (and the median of the segmented variable is used as a starting value). If seg.Z includes several covariates, psi has be specified as a named list of vectors whose names have to match the variables in the seg.Z argument. Each vector of such list includes starting values for the break-point(s) for the corresponding variable in seg.Z. A NA value means that `K' quantiles (or equally spaced values) are used as starting values; K is fixed via the seg.control auxiliary function. See npsi as an alternative to specify just the number of breakpoints.

npsi

A named vector or list meaning the number (and not locations) of breakpoints to be estimated. The starting values will be internally computed via the quantiles or equally spaced values, as specified in argument quant in seg.control. npsi can be missing and npsi=1 is assumed for all variables specified in seg.Z. If psi is provided, npsi is ignored.

fixed.psi

An optional named list meaning the breakpoints to be kept fixed during the estimation procedure. The names should be a subset of (or even the same) variables specified in seg.Z. If there is a single variable in seg.Z, a simple numeric vector can be specified. Note that, in addition to the values specified here, segmented will estimate additional breakpoints. To keep fixed all breakpoints (to be specified in psi) use it.max=0 in seg.control

control

a list of parameters for controlling the fitting process. See the documentation for seg.control for details.

model

logical value indicating if the model.frame should be returned.

keep.class

logical value indicating if the final fit returned by segmented.default should keep the class 'segmented' (along with the class of the original fit obj). Ignored by the segmented methods.

...

optional arguments (to be ignored safely). Notice specific arguments relevant to the original call (via lm or glm for instance), such as weights or offet, have to be included in the starting model obj

adjX

if obj is a ts, the segmented variable (if not specified in seg.Z) is computed by taking information from the time series (e.g., years starting from 2000, say). If adjX=TRUE, the segmented variable is shifted such that its min equals zero. Default is using the unshifted values, but if there are several breakpoints to be estimated , it is strongly suggested to set adjX=TRUE.

weights

the weights if obj is a vector or a ts object, otherwise the weights should be specified in the starting fit obj.

Warning

At convergence, if the estimated breakpoints are too close each other or at the boundaries, the parameter point estimate could be returned, but without finite standard errors. To avoid that, segmented revises the final breakpoint estimates to allow that at least min.nj are within each interval of the segmented covariate. A warning message is printed if such adjustment is made. See min.nj in seg.control.

Author

Vito M. R. Muggeo, vito.muggeo@unipa.it

Details

Given a linear regression model usually of class "lm" or "glm" (or even a simple numeric/ts vector), segmented tries to estimate a new regression model having broken-line relationships with the variables specified in seg.Z. A segmented (or broken-line) relationship is defined by the slope parameters and the break-points where the linear relation changes. The number of breakpoints of each segmented relationship is fixed via the psi (or npsi) argument, where initial values for the break-points (or simply their number via npsi) must be specified. The model is estimated simultaneously yielding point estimates and relevant approximate standard errors of all the model parameters, including the break-points.

Since version 0.2-9.0 segmented implements the bootstrap restarting algorithm described in Wood (2001). The bootstrap restarting is expected to escape the local optima of the objective function when the segmented relationship is flat and the log likelihood can have multiple local optima.

Since version 0.5-0.0 the default method segmented.default has been added to estimate segmented relationships in general (besides "lm" and "glm" fits) regression models, such as Cox regression or quantile regression (for a single percentile). The objective function to be minimized is the (minus) value extracted by the logLik function or it may be passed on via the fn.obj argument in seg.control. See example below. While the default method is expected to work with any regression fit (where the usual coef(), update(), and logLik() returns appropriate results), it is not recommended for "lm" or "glm" fits (as segmented.default is slower than the specific methods segmented.lm and segmented.glm), although final results are the same. However the object returned by segmented.default is not of class "segmented", as currently the segmented methods are not guaranteed to work for `generic' (i.e., besides "lm" and "glm") regression fits. The user could try each "segmented" method on the returned object by calling it explicitly (e.g. via plot.segmented() or confint.segmented() wherein the regression coefficients and relevant covariance matrix have to be specified, see .coef and .vcov in plot.segmented(), confint.segmented(), slope()).

References

Muggeo, V.M.R. (2003) Estimating regression models with unknown break-points. Statistics in Medicine 22, 3055--3071.

Muggeo, V.M.R. (2008) Segmented: an R package to fit regression models with broken-line relationships. R News 8/1, 20--25.

See Also

segmented.glm for segmented GLM and segreg to fit the models via a formula interface. segmented.lme fits random changepoints (segmented mixed) models.

Examples

Run this code

set.seed(12)
xx<-1:100
zz<-runif(100)
yy<-2+1.5*pmax(xx-35,0)-1.5*pmax(xx-70,0)+15*pmax(zz-.5,0)+rnorm(100,0,2)
dati<-data.frame(x=xx,y=yy,z=zz)
out.lm<-lm(y~x,data=dati)

#the simplest example: the starting model includes just 1 covariate 
#.. and 1 breakpoint has to be estimated for that
o<-segmented(out.lm) #1 breakpoint for x

#the single segmented variable is not in the starting model, and thus..
#... you need to specify it via seg.Z, but no starting value for psi
o<-segmented(out.lm, seg.Z=~z)
#note the leftmost slope is constrained to be zero (since out.lm does not include z)

#2 segmented variables, 1 breakpoint each (again no need to specify npsi or psi)
o<-segmented(out.lm,seg.Z=~z+x)


#1 segmented variable, but 2 breakpoints: you have to specify starting values (vector) for psi:
o<-segmented(out.lm,seg.Z=~x,psi=c(30,60), control=seg.control(display=FALSE))

#.. or you can specify just the *number* of breakpoints
#o<-segmented(out.lm,seg.Z=~x, npsi=2, control=seg.control(display=FALSE)) 

slope(o) #the slopes of the segmented relationship


#2 segmented variables: starting values requested via a named list
out.lm<-lm(y~z,data=dati)
o1<-update(o,seg.Z=~x+z,psi=list(x=c(30,60),z=.3))
#..or by specifying just the *number* of breakpoints
#o1<-update(o,seg.Z=~x+z, npsi=c(x=2,z=1))



#the default method leads to the same results (but it is slower)
#o1<-segmented.default(out.lm,seg.Z=~x+z,psi=list(x=c(30,60),z=.3))
#o1<-segmented.default(out.lm,seg.Z=~x+z,psi=list(x=c(30,60),z=.3), 
#    control=seg.control(fn.obj="sum(x$residuals^2)"))


#automatic procedure to estimate breakpoints in the covariate x (starting from K quantiles)
# Hint: increases number of iterations. Notice: bootstrap restart is not allowed!
# However see ?selgmented for a better approach
#o<-segmented.lm(out.lm,seg.Z=~x+z,psi=list(x=NA,z=.3), 
#    control=seg.control(fix.npsi=FALSE, n.boot=0, tol=1e-7, it.max = 50, K=5, display=TRUE))

#assess the progress of the breakpoint estimates throughout the iterations
if (FALSE) {
par(mfrow=c(1,2))
draw.history(o, "x")
draw.history(o, "z")
}
#try to increase the number of iterations and re-assess the 
#convergence diagnostics 


# A simple segmented model with continuous responses and no linear covariates
# No need to fit the starting lm model:
segmented(yy, npsi=2) #NOTE: subsetting the vector works ( segmented(yy[-1],..) ) 
#only a single segmented covariate is allowed in seg.Z, and if seg.Z is unspecified, 
#   the segmented variable is taken as 1:n/n 


# An example using the Arima method:
if (FALSE) {
n<-50
idt <-1:n #the time index

mu<-50-idt +1.5*pmax(idt-30,0)
set.seed(6969)
y<-mu+arima.sim(list(ar=.5),n)*3.5

o<-arima(y, c(1,0,0), xreg=idt)
os1<-segmented(o, ~idt, control=seg.control(display=TRUE))

#note using the .coef argument is mandatory!
slope(os1, .coef=os1$coef)
plot(y)
plot(os1, add=TRUE, .coef=os1$coef, col=2)

}

################################################################
################################################################
######Four examples using the default method:
################################################################
################################################################


################################################################
#==> 1. Cox regression with a segmented relationship  
################################################################
if (FALSE) {
library(survival)
data(stanford2)

o<-coxph(Surv(time, status)~age, data=stanford2)
os<-segmented(o, ~age, psi=40) #estimate the breakpoint in the age effect
summary(os) #actually it means summary.coxph(os)
plot(os) #it does not work
plot.segmented(os) #call explicitly plot.segmented() to plot the fitted piecewise lines


################################################################
# ==> 2. Linear mixed model via the nlme package
################################################################

dati$g<-gl(10,10) #the cluster 'id' variable
library(nlme)
o<-lme(y~x+z, random=~1|g, data=dati)
os<-segmented.default(o, ~x+z, npsi=list(x=2, z=1))

#summarizing results (note the '.coef' argument)
slope(os, .coef=fixef(os))
plot.segmented(os, "x", .coef=fixef(os), conf.level=.95)
confint.segmented(os, "x", .coef=fixef(os))
dd<-data.frame(x=c(20,50),z=c(.2,.6), g=1:2)
predict.segmented(os, newdata=dd, .coef=fixef(os)) 


################################################################
# ==> 3. segmented quantile regression via the quantreg  package
################################################################

library(quantreg)
data(Mammals)
y<-with(Mammals, log(speed))
x<-with(Mammals, log(weight))
o<-rq(y~x, tau=.9)
os<-segmented.default(o, ~x) #it does NOT work. It cannot compute the vcov matrix..

#Let's define the vcov.rq function.. (I don't know if it is the best option..)
vcov.rq<-function(x,...) {
  V<-summary(x,cov=TRUE,se="nid",...)$cov
  rownames(V)<-colnames(V)<-names(x$coef)
V}

os<-segmented.default(o, ~x) #now it does work
 plot.segmented(os, res=TRUE, col=2, conf.level=.95)


################################################################
# ==> 4. segmented regression with the svyglm() (survey  package)   
################################################################

library(survey)
data(api)
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)

o<-svyglm(api00~ell, design=dstrat)

#specify as a string the objective function to be minimized. It can be obtained via svyvar() 

fn.x<- 'as.numeric(svyvar(resid(x, "pearson"), x$survey.design, na.rm = TRUE))'
os<-segmented.default(o, ~ell, control=seg.control(fn.obj=fn.x, display=TRUE))
slope(os)
plot.segmented(os, res=TRUE, conf.level=.9, shade=TRUE)
}
            

Run the code above in your browser using DataLab