plot-predict-simulate: Plots, Predictions and Samples from mlt Objects

Description

Plot, predict and sample from objects of class mlt

Usage

# S3 method for ctm
plot(x, newdata, type = c(
         "distribution", "logdistribution", 
         "survivor", "logsurvivor", 
         "density", "logdensity", 
         "hazard", "loghazard", 
         "cumhazard", "logcumhazard", 
         "odds", "logodds", 
         "quantile", "trafo"),
     q = NULL, prob = 1:(K - 1) / K, K = 50, col = rgb(.1, .1, .1, .1), lty = 1, 
     add = FALSE, ...)
# S3 method for mlt
plot(x, ...)
# S3 method for ctm
predict(object, newdata, type = c("trafo", 
         "distribution", "logdistribution", 
         "survivor", "logsurvivor", 
         "density", "logdensity", 
         "hazard", "loghazard", 
         "cumhazard", "logcumhazard", 
         "odds", "logodds", 
         "quantile"), 
         terms = c("bresponse", "binteracting", "bshifting"), 
         q = NULL, prob = NULL, K = 50, interpolate = FALSE, ...)
# S3 method for mlt
predict(object, newdata = object$data, ...)
# S3 method for ctm
simulate(object, nsim = 1, seed = NULL, newdata, K = 50, q = NULL,
         interpolate = FALSE, bysim = TRUE, ...)
# S3 method for mlt
simulate(object, nsim = 1, seed = NULL, newdata = object$data, bysim = TRUE, ...)

Arguments

object: a fitted conditional transformation model as returned by mlt or an unfitted conditional transformation model as returned by ctm
x: a fitted conditional transformation model as returned by mlt
newdata: an optional data frame of observations
type: type of prediction or plot to generate
q: quantiles at which to evaluate the model
prob: probabilities for the evaluation of the quantile function (type = "quantile")
terms: terms to evaluate for the predictions, corresponds to the argument response, interacting and shifting in ctm
K: number of grid points to generate (in the absence of q)
col: color for the lines to plot
lty: line type for the lines to plot
add: logical indicating if a new plot shall be generated (the default)
interpolate: logical indicating if quantiles shall be interpolated linearily. This unnecessary option is no longer implemented (starting with 1.2-1).
nsim: number of samples to generate
seed: optional seed for the random number generator
bysim: logical, if TRUE a list with nsim elements is returned, each element is of length nrow(newdata) and contains one sample from the conditional distribution for each row of newdata. If FALSE, a list of length nrow(newdata) is returned, its ith element of length nsim contains nsim samples from the conditional distribution given newdata[i,].
...: additional arguments

Details

plot evaluates the transformation function over a grid of q values for all observations in newdata and plots these functions (according to type). predict evaluates the transformation function over a grid of q values for all observations in newdata and returns the result as a matrix (where _columns_ correspond to _rows_ in newdata, see examples). Lack of type = "mean" is a feature and not a bug.

Argument type defines the scale of the plots or predictions: type = "distribution" means the cumulative distribution function, type = "survivor" is the survivor function (one minus distribution function), type = "density" the absolute continuous or discrete density (depending on the response), type = "hazard", type = "cumhazard", and type = "odds" refers to the hazard (absolute continuous or discrete), cumulative hazard (defined as minus log-survivor function in both the absolute continuous and discrete cases), and odds (distribution divided by survivor) functions. The quantile function can be evaluated for probabilities prob by type = "quantile".

Note that the predict method for ctm objects requires all model coefficients to be specified in this unfitted model. simulate draws samples from object by numerical inversion of the quantile function.

Note that offsets are ALWAYS IGNORED when computing predictions. If you want the methods to pay attention to offsets, specify them as a variable in the model with fixed regression coefficient using the fixed argument in mlt.

More examples can be found in Hothorn (2018).

References

Torsten Hothorn (2020), Most Likely Transformations: The mlt Package, Journal of Statistical Software, 92(1), 1--68, tools:::Rd_expr_doi("10.18637/jss.v092.i01")

Examples

Run this code


  library("survival")
  op <- options(digits = 2)

  ### GBSG2 dataset
  data("GBSG2", package = "TH.data")

  ### right-censored response
  GBSG2$y <- with(GBSG2, Surv(time, cens))

  ### define Bernstein(log(time)) parameterisation
  ### of transformation function. The response
  ### is bounded (log(0) doesn't work, so we use log(1))
  ### support defines the support of the Bernstein polynomial
  ### and add can be used to make the grid wider (see below)
  rvar <- numeric_var("y", bounds = c(0, Inf), 
                      support = c(100, 2000))
  rb <- Bernstein_basis(rvar, order = 6, ui = "increasing")
  ### dummy coding of menopausal status
  hb <- as.basis(~ 0 + menostat, data = GBSG2)
  ### treatment contrast of hormonal treatment
  xb <- as.basis(~ horTh, data = GBSG2, remove_intercept = TRUE)

  ### set-up and fit Cox model, stratified by menopausal status
  m <- ctm(rb, interacting = hb, shifting = xb, todistr = "MinExtrVal")
  fm <- mlt(m, data = GBSG2)

  ### generate grid for all three variables
  ### note that the response grid ranges between 1 (bounds[1])
  ### and 2000 (support[2])
  (d <- mkgrid(m, n = 10))
  ### data.frame of menopausal status and treatment
  nd <- do.call("expand.grid", d[-1])

  ### plot model on different scales, for all four combinations
  ### of menopausal status and hormonal treatment
  typ <- c("distribution", "survivor", "density", "hazard", 
           "cumhazard", "odds")
  layout(matrix(1:6, nrow = 2))
  nl <- sapply(typ, function(tp) 
      ### K = 500 makes densities and hazards smooth
      plot(fm, newdata = nd, type = tp, col = 1:nrow(nd), K = 500))
  legend("topleft", lty = 1, col = 1:nrow(nd), 
         legend = do.call("paste", nd), bty = "n")

  ### plot calls predict, which generates a grid with K = 50
  ### response values
  ### note that a K x nrow(newdata) matrix is returned
  ### (for reasons explained in the next example)
  predict(fm, newdata = nd, type = "survivor")

  ### newdata can take a list, and evaluates the survivor
  ### function on the grid defined by newdata 
  ### using a linear array model formulation and is 
  ### extremely efficient (wrt computing time and memory)
  ### d[1] (the response grid) varies fastest
  ### => the first dimension of predict() is always the response,
  ### not the dimension of the predictor variables (like one 
  ### might expect)
  predict(fm, newdata = d, type = "survivor")

  ### owing to this structure, the result can be quickly stored in 
  ### a data frame as follows
  cd <- do.call("expand.grid", d)
  cd$surv <- c(S <- predict(fm, newdata = d, type = "survivor"))

  ### works for distribution functions
  all.equal(1 - S, predict(fm, newdata = d, type = "distribution"))
  ### cumulative hazard functions
  all.equal(-log(S), predict(fm, newdata = d, type = "cumhazard"))
  ### log-cumulative hazard functions (= trafo, for Cox models)
  all.equal(log(-log(S)), predict(fm, newdata = d, type = "logcumhazard"))
  all.equal(log(-log(S)), predict(fm, newdata = d, type = "trafo"))
  ### densities, hazards, or odds functions
  predict(fm, newdata = d, type = "density")
  predict(fm, newdata = d, type = "hazard")
  predict(fm, newdata = d, type = "odds")
  ### and quantiles (10 and 20%)
  predict(fm, newdata = d[-1], type = "quantile", prob = 1:2 / 10)

  ### note that some quantiles are only defined as intervals
  ### (> 2000, in this case). Intervals are returned as an "response" 
  ### object, see ?R. Unfortunately, these can't be stored as array, so
  ### a data.frame is returned where the quantile varies first
  p <- c(list(prob = 1:9/10), d[-1])
  np <- do.call("expand.grid", p)
  (Q <- predict(fm, newdata = d[-1], type = "quantile", prob = 1:9 / 10))
  np$Q <- Q
  np

  ### simulating from the model works by inverting the distribution 
  ### function; some obs are right-censored at 2000
  (s <- simulate(fm, newdata = nd, nsim = 3))
  ### convert to Surv
  sapply(s, as.Surv)

  ### generate 3 parametric bootstrap samples from the model
  tmp <- GBSG2[, c("menostat", "horTh")]
  s <- simulate(fm, newdata = tmp, nsim = 3)
  ### refit the model using the simulated response
  lapply(s, function(y) {
    tmp$y <- y
    coef(mlt(m, data = tmp))
  })

  options(op)

Run the code above in your browser using DataLab