forest_plot: Forest plots for survival analysis.

Description

Creates a forest plot from SurvivalAnalysisResult objects. Both univariate (analyse_survival) results, typically with use_one_hot=TRUE, and multivariate (analyse_multivariate) results are acceptable.

Usage

forest_plot(
  ...,
  use_one_hot = FALSE,
  factor_labeller = identity,
  endpoint_labeller = identity,
  orderer = identity_order,
  categorizer = NULL,
  relative_widths = c(1, 1, 1),
  ggtheme = theme_bw(),
  labels_displayed = c("endpoint", "factor"),
  label_headers = c(endpoint = "Endpoint", factor = "Subgroup", n = "n"),
  values_displayed = c("HR", "CI", "p"),
  value_headers = c(HR = "HR", CI = "CI", p = "p", n = "n", subgroup_n = "n"),
  HRsprintfFormat = "%.2f",
  psprintfFormat = "%.3f",
  p_lessthan_cutoff = 0.001,
  log_scale = TRUE,
  HR_x_breaks = seq(0, 10),
  HR_x_limits = NULL,
  factor_id_sep = ":",
  na_rm = TRUE,
  title = NULL,
  title_relative_height = 0.1,
  title_label_args = list(),
  base_papersize = dinA(4)
)
forest_plot.df(
  .df,
  factor_labeller = identity,
  endpoint_labeller = identity,
  orderer = identity_order,
  categorizer = NULL,
  relative_widths = c(1, 1, 1),
  ggtheme = theme_bw(),
  labels_displayed = c("endpoint", "factor"),
  label_headers = c(endpoint = "Endpoint", factor = "Subgroup", n = "n"),
  values_displayed = c("HR", "CI", "p"),
  value_headers = c(HR = "HR", CI = "CI", p = "p", n = "n", subgroup_n = "n"),
  HRsprintfFormat = "%.2f",
  psprintfFormat = "%.3f",
  p_lessthan_cutoff = 0.001,
  log_scale = TRUE,
  HR_x_breaks = seq(0, 10),
  HR_x_limits = NULL,
  factor_id_sep = ":",
  na_rm = TRUE,
  title = NULL,
  title_relative_height = 0.1,
  title_label_args = list(),
  base_papersize = dinA(4)
)

Value

A ggplot2 plot object

Arguments

...

The SurvivalAnalysisResult objects. You can also pass one list of such objects, or use explicit splicing (!!! operator). If not use_one_hot, also a list of coxph objects, or a mix is acceptable.

use_one_hot

If not use_one_hot (default), will take univariate or multivariate results and plot hazard ratios against the reference level (as provided to the analyse_survival or analyse_multivariate function, or, per default, the first factor level), resulting in k-1 values for k levels. If use_one_hot == TRUE, will only accept univariate results from analyse_survival and plot HRs of one factor level vs. remaining cohort, resulting in k values for k levels.

factor_labeller, endpoint_labeller

Either

A function which returns labels for the input: First argument, a vector of either (factor.ids) or (endpoints), resp. If the function takes ... or two arguments, as second argument a data frame with (at least) the columns survivalResult, endpoint, factor.id, factor.name, factor.value, HR, Lower_CI, Upper_CI, p, n, where survivalResult is the corresponding result object passed to forest_plot; Note the function must be vectorized, if you have a non-vectorized function taking single arguments, you may want to have a look at purrr::map_chr or purrr::pmap_chr.
a dictionaryish list, looks up by (endpoints) or (factor.ids). The factor.id value: For continous factors, the factor name (column name in data frame); For categorical factors, factor name, factor_id_sep, and the factor level value. (note: If use_one_hot = FALSE, the HR is factor level value vs. cox reference given to survival_analysis; if use_one_hot = TRUE, the HR is the factor level value vs. remaining population)

orderer

A function which returns an integer ordering vector for the input:

if the supplied function takes exactly one argument, a data frame with (at least) the columns survivalResult, endpoint, factor.id, factor.name, factor.value, HR, Lower_CI, Upper_CI, p, n, subgroup_n where survivalResult is the corresponding result object passed to forest_plot;
or, if the function takes more than one argument, or its arguments include ..., the nine vectors (endpoint, factor.name, factor.value, HR, Lower_CI, Upper_CI, p, n, subgroup_n): a vector of endpoints (as given to Surv(endpoint, ...) in coxph), a vector of factors (as given to the right hand side of the coxph formula), and numeric vectors of the HR, lower CI, upper CI, p-value
You can create a function from ordered vectors via orderer_function_from_sorted_vectors, or call order() with one or more of these vectors.
Alternatively, you can provide a quosure of code, or a right-hand side formula; it will be executed such that the above nine vectors are available as symbols.

Example:

orderer = quo(order(endpoint, HR))
equivalent to orderer = ~order(endpoint, HR)
equivalent to orderer = function(df) df %$% order(endpoint, HR)
equivalent to orderer = function(df) { order(df$endpoint, df$HR) }
equivalent to orderer = function(endpoint, factor.name, factor.value, HR, ...) order(endpoint, HR)

categorizer

A function which returns one logical value if a breaking line should be inserted _above_ the input: Same semantics as for orderer. !Please note!: The order of the data is not yet ordered as per your orderer! If you do calculations depending on order, first order with your own orderer function. A proper implementation is easy using sequential_duplicates, for example categorizer=~!sequential_duplicates(endpoint, ordering = order(endpoint, HR))

relative_widths

relation of the width of the plots, labels, plot, values. Default is 1:1:1.

ggtheme

ggplot2 theme to use

labels_displayed

Combination of "endpoint", "factor", "n", determining what is shown on the left-hand table and in which order.

label_headers

Named vector with name=<allowed values of labels_displayed>, value=<your heading>.

values_displayed

Combination of "HR", "CI", "p", "subgroup_n", determining what is shown on the right-hand table and in which order. Note: subgroup_n is only applicable if oneHot=TRUE.

value_headers

Named vector with name=<allowed values of values_displayed>, value=<your heading>.

HRsprintfFormat, psprintfFormat

sprintf() format strings for hazard ratio and p value

p_lessthan_cutoff

The lower limit below which p value will be displayed as "less than". If p_lessthan_cutoff == 0.001, the a p value of 0.002 will be displayed as is, while 0.0005 will become "p < 0.001".

log_scale

Plot on log scale, which is quite common and gives symmetric length for the CI bars. Note that HRs of 0 (did not converge) will not be plotted in this case.

HR_x_breaks

Breaks of the x scale for plotting HR and CI

HR_x_limits

Limits of the x scale for plotting HR and CI. Default (HR_x_lim = NULL) depends on log_scale and existing limits. Pass NA to use the existing minimum and maximum values without interference. Pass a vector of size 2 to specify (min, max) manually

factor_id_sep

Allows you to customize the separator of the factor id, the documentation of factor_labeller.

na_rm

Only used in the multivariate case (use_one_hot = FALSE). Should null coefficients (NA/0/Inf) be removed?

title, title_relative_height, title_label_args

A title on top of the plot, taking a fraction of title_relative_height of the returned plot. The title is drawn using draw_label; you can specify any arguments to this function by giving title_label_args Per default, font attributes are taken from the "title" entry from the given ggtheme, and the label is drawn centered as per draw_label defaults.

base_papersize

numeric vector of length 2, c(width, height), unit inches. forest_plot will store a suggested "papersize" attribute in the return value, computed from base_papersize and the number of entries in the plot (in particular, the height will be adjusted) The attribute is read by save_pdf. It will also store a "forestplot_entries" attribute which you can use for your own calculations.

.df

Data frame containing the columns survivalResult, endpoint, factor.id, factor.name, factor.value, HR, Lower_CI, Upper_CI, p, n, subgroup_n giving the information that is to be presented in the forest plot

df

For the variant taking a data frame: A data frame which must contain (at least) the columns: endpoint, factor.id, factor.name, factor.value, HR, Lower_CI, Upper_CI, p, n, subgroup_n

Functions

forest_plot.df: Creates a forest plot from the given data frame

Details

The plot has a left column containing the labels (covariate name, levels for categorical variables, optionally subgroup size), the actual line plot in the middle column, and a right column to display the hazard ratios and their confidence intervals. A rich set of parameters allows full customizability to create publication-ready plots.

Examples

Run this code

library(magrittr)
library(dplyr)
survival::colon %>%
   analyse_multivariate(vars(time, status),
                        vars(rx, sex, age, obstruct, perfor, nodes, differ, extent)) %>%
   forest_plot()

Run the code above in your browser using DataLab