An implementation of the random forest and bagging ensemble algorithms utilizing
LTRC conditional inference trees LTRCIT
as base learners for
left-truncated right-censored survival data with time-invariant covariates.
It also allows for (left-truncated) right-censored survival data with
time-varying covariates.
ltrccif(
formula,
data,
id,
mtry = NULL,
ntree = 100L,
bootstrap = c("by.sub", "by.root", "by.user", "none"),
samptype = c("swor", "swr"),
sampfrac = 0.632,
samp = NULL,
na.action = "na.omit",
stepFactor = 2,
trace = TRUE,
applyfun = NULL,
cores = NULL,
control = partykit::ctree_control(teststat = "quad", testtype = "Univ", minsplit =
max(ceiling(sqrt(nrow(data))), 20), minbucket = max(ceiling(sqrt(nrow(data))), 7),
minprob = 0.01, mincriterion = 0, saveinfo = FALSE)
)
a formula object, with the response being a Surv
object, with form
Surv(tleft, tright, event)
.
a data frame containing n
rows of
left-truncated right-censored observations.
For time-varying data, this should be
a data frame containing pseudo-subject observations based on the Andersen-Gill
reformulation.
variable name of subject identifiers. If this is present, it will be
searched for in the data
data frame. Each group of rows in data
with the same subject id
represents the covariate path through time of
a single subject. If not specified, the algorithm then assumes data
contains left-truncated and right-censored survival data with time-invariant
covariates.
number of input variables randomly sampled as candidates at each node for
random forest algorithms. The default mtry
is tuned by tune.ltrccif
.
an integer, the number of the trees to grow for the forest.
ntree = 100L
is set by default.
bootstrap protocol.
(1) If id
is present,
the choices are: "by.sub"
(by default) which bootstraps subjects,
"by.root"
which bootstraps pseudo-subjects.
Both can be with or without replacement (by default sampling is without
replacement; see the option perturb
below);
(2) If id
is not specified, it
bootstraps the data
by sampling with or without replacement.
Regardless of the presence of id
, if "none"
is chosen,
data
is not bootstrapped at all, and is used in
every individual tree. If "by.user"
is choosen,
the bootstrap specified by samp
is used.
choices are swor
(sampling without replacement) and
swr
(sampling with replacement). The default action here is sampling
without replacement.
a fraction, determining the proportion of subjects to draw
without replacement when samptype = "swor"
. The default value is 0.632
.
To be more specific, if id
is present, 0.632 * N
of subjects with their
pseudo-subject observations are drawn without replacement (N
denotes the
number of subjects); otherwise, 0.632 * n
is the requested size
of the sample.
Bootstrap specification when bootstype = "by.user"
.
Array of dim n x ntree
specifying how many times each record appears
in each bootstrap sample.
action taken if the data contains NA
<U+2019>s. The default
"na.omit"
removes the entire record if any of its entries is
NA
(for x-variables this applies only to those specifically listed
in formula
). See function cforest
for
other available options.
at each iteration, mtry
is inflated (or deflated)
by this value, used when mtry
is not specified (see tune.ltrccif
).
The default value is 2
.
whether to print the progress of the search of the optimal value of
mtry
, when mtry
is not specified (see tune.ltrccif
).
trace = TRUE
is set by default.
an optional lapply
-style function with arguments
function(X, FUN, ...)
.
It is used for computing the variable selection criterion. The default is to use the
basic lapply
function unless the cores
argument is specified (see below).
See ctree_control
.
numeric. See ctree_control
.
a list of control parameters, see ctree_control
.
control
parameters minsplit
, minbucket
have been adjusted from the
cforest
defaults. Other default values correspond to those of the
default values used by ctree_control
.
An object belongs to the class ltrccif
, as a subclass of
cforest
.
This function extends the conditional inference survival forest algorithm in
cforest
to fit left-truncated and right-censored data,
which allow for time-varying covariates.
Andersen, P. and Gill, R. (1982). Cox's regression model for counting processes, a large sample study. Annals of Statistics, 10:1100-1120.
Fu, W. and Simonoff, J.S. (2016). Survival trees for left-truncated and right-censored data, with application to time-varying covariate data. Biostatistics, 18(2):352<U+2013>369.
predictProb
for prediction and tune.ltrccif
for mtry
tuning.
# NOT RUN {
#### Example with time-varying data pbcsample
library(survival)
Formula = Surv(Start, Stop, Event) ~ age + alk.phos + ast + chol + edema
## Fit an LTRCCIF on the time-invariant data, with mtry tuned with stepFactor = 3.
LTRCCIFobj = ltrccif(formula = Formula, data = pbcsample, ntree = 20L, stepFactor = 3)
# }
Run the code above in your browser using DataLab