plotp.Predict: Plot Effects of Variables Estimated by a Regression Model Fit Using plotly

Description

Uses plotly graphics (without using ggplot2) to plot the effect of one or two predictors on the linear predictor or X beta scale, or on some transformation of that scale. The first argument specifies the result of the Predict function. The predictor is always plotted in its original coding. Hover text shows point estimates, confidence intervals, and on the leftmost x-point, adjustment variable settings.

If Predict was run with no variable settings, so that each predictor is varied one at a time, the result of plotp.Predict is a list with two elements. The first, named Continuous, is a plotly object containing a single graphic with all the continuous predictors varying. The second, named Categorical, is a plotly object containing a single graphic with all the categorical predictors varying. If there are no categorical predictors, the value returned by by plotp.Predict is a single plotly object and not a list of objects.

If rdata is given, a spike histogram is drawn showing the location/density of data values for the \(x\)-axis variable. If there is a superposition variable that generated separate curves, the data density specific to each class of points is shown. The histograms are drawn by histSpikeg.

To plot effects instead of estimates (e.g., treatment differences as a function of interacting factors) see contrast.rms and summary.rms.

Unlike ggplot.Predict, plotp.Predict does not handle groups, anova, or perim arguments.

Usage

# S3 method for Predict
plotp(data, subset, xlim, ylim, xlab, ylab, 
     rdata=NULL, nlevels=3, vnames=c('labels','names'),
     histSpike.opts=list(frac=function(f) 0.01 + 
         0.02 * sqrt(f - 1)/sqrt(max(f, 2) - 1), side=1, nint=100),
     ncols=3, width=800, ...)

Value

a plotly object or a list containing two elements, each one a plotly object

Arguments

data: a data frame created by Predict
subset: a subsetting expression for restricting the rows of data that are used in plotting. For example, predictions may have been requested for males and females but one wants to plot only females.
xlim: ignored unless predictors were specified to Predict. Specifies the x-axis limits of the single plot produced.
ylim: Range for plotting on response variable axis. Computed by default and includes the confidence limits.
xlab: Label for x-axis when a single plot is made, i.e., when a predictor is specified to Predict. Default is one given to asis, rcs, etc., which may have been the "label" attribute of the variable.
ylab: Label for y-axis. If fun is not given, default is "log Odds" for lrm, "log Relative Hazard" for cph, name of the response variable for ols, TRUE or log(TRUE) for psm, or "X * Beta" otherwise. Specify ylab=NULL to omit y-axis labels.
rdata: a data frame containing the original raw data on which the regression model were based, or at least containing the \(x\)-axis and grouping variable. If rdata is present and contains the needed variables, the original data are added to the graph in the form of a spike histogram using histSpikeg in the Hmisc package.
nlevels: A non-numeric x-axis variable with nlevels or fewer unique values will cause a horizontal dot plot to be drawn instead of an x-y plot.
vnames: applies to the case where multiple plots are produced separately by predictor. Set to 'names' to use variable names instead of labels for these small plots.
histSpike.opts: a list containing named elements that specifies parameters to histSpikeg when rdata is given. The col parameter is usually derived from other plotting information and not specified by the user.
ncols: number of columns of plots to use when plotting multiple continuous predictors
width: width in pixels for plotly graphics
...: ignored

Author

Frank Harrell
Department of Biostatistics, Vanderbilt University
fh@fharrell.com

References

Fox J, Hong J (2009): Effect displays in R for multinomial and proportional-odds logit models: Extensions to the effects package. J Stat Software 32 No. 1.

Examples

Run this code

if (FALSE) {
n <- 350     # define sample size
set.seed(17) # so can reproduce the results
age            <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol    <- rnorm(n, 200, 25)
sex            <- factor(sample(c('female','male'), n,TRUE))
label(age)            <- 'Age'      # label is in Hmisc
label(cholesterol)    <- 'Total Cholesterol'
label(blood.pressure) <- 'Systolic Blood Pressure'
label(sex)            <- 'Sex'
units(cholesterol)    <- 'mg/dl'   # uses units.default in Hmisc
units(blood.pressure) <- 'mmHg'

# Specify population model for log odds that Y=1
L <- .4*(sex=='male') + .045*(age-50) +
    (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) +
    .01 * (blood.pressure - 120)
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)

ddist <- datadist(age, blood.pressure, cholesterol, sex)
options(datadist='ddist')

fit <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)),
           x=TRUE, y=TRUE)

p <- plotp(Predict(fit))
p$Continuous
p$Categorical
# When using Rmarkdown html notebook, best to use
# prList(p) to render the two objects
plotp(Predict(fit), rdata=llist(blood.pressure, age))$Continuous
# spike histogram plot for two of the predictors

p <- Predict(fit, name=c('age','cholesterol'))   # Make 2 plots
plotp(p)

p <- Predict(fit, age, sex)
plotp(p, rdata=llist(age,sex))
# rdata= allows rug plots (1-dimensional scatterplots)
# on each sex's curve, with sex-
# specific density of age
# If data were in data frame could have used that
p <- Predict(fit, age=seq(20,80,length=100), sex='male', fun=plogis)
# works if datadist not used
plotp(p, ylab='P')
# plot predicted probability in place of log odds

# Compute predictions for three predictors, with superpositioning or
# conditioning on sex, combined into one graph

p1 <- Predict(fit, age, sex)
p2 <- Predict(fit, cholesterol, sex)
p3 <- Predict(fit, blood.pressure, sex)
p <- rbind(age=p1, cholesterol=p2, blood.pressure=p3)
plotp(p, ncols=2, rdata=llist(age, cholesterol, sex))
}

Run the code above in your browser using DataLab