Learn R Programming

LTRCforests (version 0.7.0)

ltrccif: Fit a LTRC conditional inference forest

Description

An implementation of the random forest and bagging ensemble algorithms utilizing LTRC conditional inference trees LTRCIT as base learners for left-truncated right-censored survival data with time-invariant covariates. It also allows for (left-truncated) right-censored survival data with time-varying covariates.

Usage

ltrccif(
  formula,
  data,
  id,
  mtry = NULL,
  ntree = 100L,
  bootstrap = c("by.sub", "by.root", "by.user", "none"),
  samptype = c("swor", "swr"),
  sampfrac = 0.632,
  samp = NULL,
  na.action = "na.omit",
  stepFactor = 2,
  trace = TRUE,
  applyfun = NULL,
  cores = NULL,
  control = partykit::ctree_control(teststat = "quad", testtype = "Univ", minsplit =
    max(ceiling(sqrt(nrow(data))), 20), minbucket = max(ceiling(sqrt(nrow(data))), 7),
    minprob = 0.01, mincriterion = 0, saveinfo = FALSE)
)

Value

An object belongs to the class ltrccif, as a subclass of cforest.

Arguments

formula

a formula object, with the response being a Surv object, with form

Surv(tleft, tright, event).

data

a data frame containing n rows of left-truncated right-censored observations. For time-varying data, this should be a data frame containing pseudo-subject observations based on the Andersen-Gill reformulation.

id

variable name of subject identifiers. If this is present, it will be searched for in the data data frame. Each group of rows in data with the same subject id represents the covariate path through time of a single subject. If not specified, the algorithm then assumes data contains left-truncated and right-censored survival data with time-invariant covariates.

mtry

number of input variables randomly sampled as candidates at each node for random forest algorithms. The default mtry is tuned by tune.ltrccif.

ntree

an integer, the number of the trees to grow for the forest. ntree = 100L is set by default.

bootstrap

bootstrap protocol. (1) If id is present, the choices are: "by.sub" (by default) which bootstraps subjects, "by.root" which bootstraps pseudo-subjects. Both can be with or without replacement (by default sampling is without replacement; see the option perturb below); (2) If id is not specified, it bootstraps the data by sampling with or without replacement. Regardless of the presence of id, if "none" is chosen, data is not bootstrapped at all, and is used in every individual tree. If "by.user" is choosen, the bootstrap specified by samp is used.

samptype

choices are swor (sampling without replacement) and swr (sampling with replacement). The default action here is sampling without replacement.

sampfrac

a fraction, determining the proportion of subjects to draw without replacement when samptype = "swor". The default value is 0.632. To be more specific, if id is present, 0.632 * N of subjects with their pseudo-subject observations are drawn without replacement (N denotes the number of subjects); otherwise, 0.632 * n is the requested size of the sample.

samp

Bootstrap specification when bootstype = "by.user". Array of dim n x ntree specifying how many times each record appears in each bootstrap sample.

na.action

action taken if the data contains NA’s. The default "na.omit" removes the entire record if any of its entries is NA (for x-variables this applies only to those specifically listed in formula). See function cforest for other available options.

stepFactor

at each iteration, mtry is inflated (or deflated) by this value, used when mtry is not specified (see tune.ltrccif). The default value is 2.

trace

whether to print the progress of the search of the optimal value of mtry, when mtry is not specified (see tune.ltrccif). trace = TRUE is set by default.

applyfun

an optional lapply-style function with arguments function(X, FUN, ...). It is used for computing the variable selection criterion. The default is to use the basic lapply function unless the cores argument is specified (see below). See ctree_control.

cores

numeric. See ctree_control.

control

a list of control parameters, see ctree_control. control parameters minsplit, minbucket have been adjusted from the cforest defaults. Other default values correspond to those of the default values used by ctree_control.

Details

This function extends the conditional inference survival forest algorithm in cforest to fit left-truncated and right-censored data, which allow for time-varying covariates.

References

Andersen, P. and Gill, R. (1982). Cox's regression model for counting processes, a large sample study. Annals of Statistics, 10:1100-1120.

Fu, W. and Simonoff, J.S. (2016). Survival trees for left-truncated and right-censored data, with application to time-varying covariate data. Biostatistics, 18(2):352–369.

See Also

predictProb for prediction and tune.ltrccif for mtry tuning.

Examples

Run this code
#### Example with time-varying data pbcsample
library(survival)
Formula = Surv(Start, Stop, Event) ~ age + alk.phos + ast + chol + edema
## Fit an LTRCCIF on the time-invariant data, with mtry tuned with stepFactor = 3.
LTRCCIFobj = ltrccif(formula = Formula, data = pbcsample, ntree = 20L, stepFactor = 3)

Run the code above in your browser using DataLab