Learn R Programming

infer (version 1.0.4)

shade_confidence_interval: Add information about confidence interval

Description

shade_confidence_interval() plots a confidence interval region on top of visualize() output. The output is a ggplot2 layer that can be added with +. The function has a shorter alias, shade_ci().

Learn more in vignette("infer").

Usage

shade_confidence_interval(
  endpoints,
  color = "mediumaquamarine",
  fill = "turquoise",
  ...
)

shade_ci(endpoints, color = "mediumaquamarine", fill = "turquoise", ...)

Value

If added to an existing infer visualization, a \ggplot2\

object displaying the supplied intervals on top of its corresponding distribution. Otherwise, an infer_layer list.

Arguments

endpoints

The lower and upper bounds of the interval to be plotted. Likely, this will be the output of get_confidence_interval(). For calculate()-based workflows, this will be a 2-element vector or a 1 x 2 data frame containing the lower and upper values to be plotted. For fit()-based workflows, a (p + 1) x 3 data frame with columns term, lower_ci, and upper_ci, giving the upper and lower bounds for each regression term. For use in visualizations of assume() output, this must be the output of get_confidence_interval().

color

A character or hex string specifying the color of the end points as a vertical lines on the plot.

fill

A character or hex string specifying the color to shade the confidence interval. If NULL then no shading is actually done.

...

Other arguments passed along to \ggplot2\ functions.

See Also

Other visualization functions: shade_p_value()

Examples

Run this code
# find the point estimate---mean number of hours worked per week
point_estimate <- gss %>%
  specify(response = hours) %>%
  calculate(stat = "mean")
  
# ...and a bootstrap distribution
boot_dist <- gss %>%
  # ...we're interested in the number of hours worked per week
  specify(response = hours) %>%
  # generating data points
  generate(reps = 1000, type = "bootstrap") %>%
  # finding the distribution from the generated data
  calculate(stat = "mean")
  
# find a confidence interval around the point estimate
ci <- boot_dist %>%
  get_confidence_interval(point_estimate = point_estimate,
                          # at the 95% confidence level
                          level = .95,
                          # using the standard error method
                          type = "se")   
  
  
# and plot it!
boot_dist %>%
  visualize() +
  shade_confidence_interval(ci)
  
# or just plot the bounds
boot_dist %>%
  visualize() +
  shade_confidence_interval(ci, fill = NULL)
  
# you can shade confidence intervals on top of
# theoretical distributions, too---the theoretical
# distribution will be recentered and rescaled to
# align with the confidence interval
sampling_dist <- gss %>%
  specify(response = hours) %>%
  assume(distribution = "t") 
  
visualize(sampling_dist) +
  shade_confidence_interval(ci)

# \donttest{
# to visualize distributions of coefficients for multiple
# explanatory variables, use a `fit()`-based workflow

# fit 1000 linear models with the `hours` variable permuted
null_fits <- gss %>%
 specify(hours ~ age + college) %>%
 hypothesize(null = "independence") %>%
 generate(reps = 1000, type = "permute") %>%
 fit()
 
null_fits

# fit a linear model to the observed data
obs_fit <- gss %>%
  specify(hours ~ age + college) %>%
  fit()

obs_fit

# get confidence intervals for each term
conf_ints <- 
  get_confidence_interval(
    null_fits, 
    point_estimate = obs_fit, 
    level = .95
  )

# visualize distributions of coefficients 
# generated under the null
visualize(null_fits)

# add a confidence interval shading layer to juxtapose 
# the null fits with the observed fit for each term
visualize(null_fits) + 
  shade_confidence_interval(conf_ints)
# }

# more in-depth explanation of how to use the infer package
if (FALSE) {
vignette("infer")
}

Run the code above in your browser using DataLab