bj
fits the Buckley-James distribution-free least squares multiple
regression model to a possibly right-censored response variable.
This model reduces to ordinary least squares if
there is no censoring. By default, model fitting is done after
taking logs of the response variable.
bj
uses the rms
class
for automatic anova
, fastbw
, validate
, Function
, nomogram
,
summary
, plot
, bootcov
, and other functions. The bootcov
function may be worth using with bj
fits, as the properties of the
Buckley-James covariance matrix estimator are not fully known for
strange censoring patterns. For the print
method, format of output is controlled by the
user previously running options(prType="lang")
where
lang
is "plain"
(the default), "latex"
, or
"html"
.
The residuals.bj
function exists mainly to compute
residuals and to censor them (i.e., return them as
Surv
objects) just as the original
failure time variable was censored. These residuals are useful for
checking to see if the model also satisfies certain distributional assumptions.
To get these residuals, the fit must have specified y=TRUE
.
The bjplot
function is a special plotting function for objects
created by bj
with x=TRUE, y=TRUE
in effect. It produces three
scatterplots for every covariate in the model: the first plots the
original situation, where censored data are distingushed from
non-censored data by a different plotting symbol. In the second plot,
called a renovated plot, vertical lines show how censored data were
changed by the procedure, and the third is equal to the second, but
without vertical lines. Imputed data are again distinguished from the
non-censored by a different symbol.
The validate
method for bj
validates the Somers' Dxy
rank
correlation between predicted and observed responses, accounting for censoring.
The primary fitting function for bj
is bj.fit
, which does not
allow missing data and expects a full design matrix as input.
bj(formula=formula(data), data, subset, na.action=na.delete, link="log", control, method='fit', x=FALSE, y=FALSE, time.inc)
"print"(x, digits=4, long=FALSE, coefs=TRUE,
title="Buckley-James Censored Data Regression", ...)
"residuals"(object, type=c("censored","censored.normalized"),...)
bjplot(fit, which=1:dim(X)[[2]])
"validate"(fit, method="boot", B=40, bw=FALSE,rule="aic",type="residual",sls=.05,aics=0, force=NULL, estimates=TRUE, pr=FALSE,
tol=1e-7, rel.tolerance=1e-3, maxiter=15, ...)
bj.fit(x, y, control)
Surv
object.
bj
, required for all functions except bj
.
bj.fit
. All models will have an intercept. For
print.bj
is a result of bj
. For bj
, set
x=TRUE
to include the design matrix in the fit object.
Surv
object to pass to bj.fit
as the two-column response
variable. Only right censoring is allowed, and there need not be any
censoring. For bj
, set y
to TRUE
to include the
two-column response matrix, with the
event/censoring indicator in the second column. The first column will
be transformed according to link
, and depending on
na.action
, rows with missing data in the predictors or the
response will be deleted.
"log"
(the default) to model the log of the
response, or "identity"
to model the untransformed response.
iter.max
(maximum number of iterations allowed, default is 20),
eps
(convergence criterion: concergence is assumed when the ratio of
sum of squared errors from one iteration to the next is between
1-eps
and 1+eps
), trace
(set to TRUE
to monitor iterations),
tol
(matrix singularity criterion, default is 1e-7), and 'max.cycle'
(in case of nonconvergence the program looks for a cycle that repeats itself,
default is 30).
"model.frame"
or "model.matrix"
to return one of those
objects rather than the model fit.
units="Day"
, 1 otherwise, unless
maximum follow-up time $< 1$. Then max time/10 is used as time.inc
.
If time.inc
is not given and max time/default time.inc
is
$> 25$, time.inc
is increased.
TRUE
to print the correlation matrix for parameter estimates
coefs=FALSE
to suppress printing the table
of model coefficients, standard errors, etc. Specify coefs=n
to print only the first n
regression coefficients in the
model.prModFit
bj
type="censored.normalized"
to divide the residuals by the estimate
of sigma
.
bjplot
make plots of only the variables listed in which
(names or numbers).
predab.resample
print
; passed through to
predab.resample
for validate
bj
returns a fit object with similar information to what survreg
,
psm
, cph
would store as
well as what rms
stores and units
and time.inc
.
residuals.bj
returns a Surv
object. One of the components of the
fit
object produced by bj
(and bj.fit
) is a vector called
stats
which contains the following names elements:
"Obs", "Events", "d.f.","error d.f.","sigma","g"
. Here
sigma
is the estimate of the residual standard deviation.
g
is the $g$-index. If the link function is "log"
,
the $g$-index on the anti-log scale is also returned as gr
.
Miller RG, Halpern J. Regression with censored data. Biometrika 1982; 69: 521--31.
James IR, Smith PJ. Consistency results for linear regression with censored data. Ann Statist 1984; 12: 590--600.
Lai TL, Ying Z. Large sample theory of a modified Buckley-James estimator for regression analysis with censored data. Ann Statist 1991; 19: 1370--402.
Hillis SL. Residual plots for the censored data linear regression model. Stat in Med 1995; 14: 2023--2036.
Jin Z, Lin DY, Ying Z. On least-squares regression with censored data. Biometrika 2006; 93:147--161.
rms
, psm
, survreg
,
cph
, Surv
,
na.delete
,
na.detail.response
, datadist
,
rcorr.cens
, GiniMd
,
prModFit
, dxy.cens
set.seed(1)
ftime <- 10*rexp(200)
stroke <- ifelse(ftime > 10, 0, 1)
ftime <- pmin(ftime, 10)
units(ftime) <- "Month"
age <- rnorm(200, 70, 10)
hospital <- factor(sample(c('a','b'),200,TRUE))
dd <- datadist(age, hospital)
options(datadist="dd")
f <- bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, x=TRUE, y=TRUE)
# add link="identity" to use a censored normal regression model instead
# of a lognormal one
anova(f)
fastbw(f)
validate(f, B=15)
plot(Predict(f, age, hospital))
# needs datadist since no explicit age,hosp.
coef(f) # look at regression coefficients
coef(psm(Surv(ftime, stroke) ~ rcs(age,5) + hospital, dist='lognormal'))
# compare with coefficients from likelihood-based
# log-normal regression model
# use dist='gau' not under R
r <- resid(f, 'censored.normalized')
survplot(npsurv(r ~ 1), conf='none')
# plot Kaplan-Meier estimate of
# survival function of standardized residuals
survplot(npsurv(r ~ cut2(age, g=2)), conf='none')
# may desire both strata to be n(0,1)
options(datadist=NULL)
Run the code above in your browser using DataLab