ci.mean: (Bootstrap) Confidence Intervals for Arithmetic Means and Medians

Description

The function ci.mean computes and plots confidence intervals for arithmetic means with known or unknown population standard deviation or population variance and the function ci.median computes confidence intervals for medians, optionally by a grouping and/or split variable. These functions also supports six types of bootstrap confidence intervals (e.g., bias-corrected (BC) percentile bootstrap or bias-corrected and accelerated (BCa) bootstrap confidence intervals) and plots the bootstrap samples with histograms and density curves.

Usage

ci.mean(data, ..., sigma = NULL, sigma2 = NULL, adjust = FALSE,
        boot = c("none", "norm", "basic", "stud", "perc", "bc", "bca"),
        R = 1000, seed = NULL, sample = TRUE,
        alternative = c("two.sided", "less", "greater"),
        conf.level = 0.95, group = NULL, split = NULL, sort.var = FALSE,
        na.omit = FALSE, digits = 2, as.na = NULL,
        plot = c("none", "ci", "boot"), point.size = 2.5, point.shape = 19,
        errorbar.width = 0.3, dodge.width = 0.5, hist = TRUE,
        binwidth = NULL, bins = NULL, hist.alpha = 0.4, fill = "gray85", density = TRUE,
        density.col = "#0072B2", density.linewidth = 0.5, density.linetype = "solid",
        point = TRUE, point.col = "#CC79A7", point.linewidth = 0.6,
        point.linetype = "solid", ci = TRUE, ci.col = "black",
        ci.linewidth = 0.6, ci.linetype = "dashed", line = FALSE, intercept = 0,
        linetype = "solid", line.col = "gray65", xlab = NULL, ylab = NULL,
        xlim = NULL, ylim = NULL, xbreaks = ggplot2::waiver(),
        ybreaks = ggplot2::waiver(), axis.title.size = 11, axis.text.size = 10,
        strip.text.size = 11, title = NULL, subtitle = NULL, group.col = NULL,
        plot.margin = NA, legend.title = "",
        legend.position = c("right", "top", "left", "bottom", "none"),
        legend.box.margin = c(-10, 0, 0, 0), facet.ncol = NULL, facet.nrow = NULL,
        facet.scales = "free", filename = NULL, width = NA, height = NA,
        units = c("in", "cm", "mm", "px"), dpi = 600, write = NULL,
        append = TRUE, check = TRUE, output = TRUE)
ci.median(data, ..., boot = c("none", "norm", "basic", "stud", "perc", "bc", "bca"),
          R = 1000, seed = NULL, sample = TRUE,
          alternative = c("two.sided", "less", "greater"),
          conf.level = 0.95, group = NULL, split = NULL, sort.var = FALSE,
          na.omit = FALSE, digits = 2, as.na = NULL, plot = c("none", "ci", "boot"),
          point.size = 2.5, point.shape = 19, errorbar.width = 0.3, dodge.width = 0.5,
          hist = TRUE, binwidth = NULL, bins = NULL, hist.alpha = 0.4, fill = "gray85",
          density = TRUE, density.col = "#0072B2", density.linewidth = 0.5,
          density.linetype = "solid", point = TRUE, point.col = "#CC79A7",
          point.linewidth = 0.6, point.linetype = "solid", ci = TRUE, ci.col = "black",
          ci.linewidth = 0.6, ci.linetype = "dashed", line = FALSE, intercept = 0,
          linetype = "solid", line.col = "gray65", xlab = NULL, ylab = NULL,
          xlim = NULL, ylim = NULL, xbreaks = ggplot2::waiver(),
          ybreaks = ggplot2::waiver(), axis.title.size = 11, axis.text.size = 10,
          strip.text.size = 11, title = NULL, subtitle = NULL, group.col = NULL,
          plot.margin = NA,  legend.title = "",
          legend.position = c("right", "top", "left", "bottom", "none"),
          legend.box.margin = c(-10, 0, 0, 0), facet.ncol = NULL, facet.nrow = NULL,
          facet.scales = "free", filename = NULL, width = NA, height = NA,
          units = c("in", "cm", "mm", "px"), dpi = 600, write = NULL, append = TRUE,
          check = TRUE, output = TRUE)

Value

Returns an object of class misty.object, which is a list with following entries:

call: function call
type: type of analysis
data: list with the input specified in data, group, and split
args: specification of function arguments
boot: data frame with bootstrap replicates of the arithmetic mean of median when bootstrapping was requested
plot: ggplot2 object for plotting the results and the data frame used for plotting
result: result table

Arguments

data: a numeric vector or data frame with numeric variables, i.e., factors and character variables are excluded from data before conducting the analysis.
...: an expression indicating the variable names in data e.g., ci.mean(x1, x2, data = dat). Note that the operators ., +, -, ~, :, ::, and ! can also be used to select variables, see 'Details' in the df.subset function.
sigma: a numeric vector indicating the population standard deviation when computing confidence intervals for the arithmetic mean with known standard deviation Note that either argument sigma or argument sigma2 is specified and it is only possible to specify one value for the argument sigma even though multiple variables are specified in data.
sigma2: a numeric vector indicating the population variance when computing confidence intervals for the arithmetic mean with known variance. Note that either argument sigma or argument sigma2 is specified and it is only possible to specify one value for the argument sigma2 even though multiple variables are specified in data.
adjust: logical: if TRUE, difference-adjustment for the confidence intervals for the arithmetic mean (Baguley, 2012) is applied.
boot: a character string specifying the type of bootstrap confidence intervals (CI), i.e., "none" (default) for not conducting bootstrapping, "norm" for the bias-corrected normal approximation bootstrap CI, "basic" for the basic bootstrap CI, "stud" for the studentized bootstrap CI, "perc", for the percentile bootstrap CI "bc" for the bias-corrected (BC) percentile bootstrap CI (without acceleration), and "bca" for the bias-corrected and accelerated (BCa) bootstrap CI, see 'Details' in the ci.cor function.
R: a numeric value indicating the number of bootstrap replicates (default is 1000).
seed: a numeric value specifying seeds of the pseudo-random numbers used in the bootstrap algorithm when conducting bootstrapping.
sample: logical: if TRUE (default), the univariate sample skewness and kurtosis is computed, while the population skewness and kurtosis is computed when sample = FALSE.
alternative: a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".
conf.level: a numeric value between 0 and 1 indicating the confidence level of the interval.
group: either a character string indicating the variable name of the grouping variable in data, or a vector representing the grouping variable. The grouping variable is excluded from the data frame specified in data. Notethat a grouping variable can only be used when computing confidence intervals with unknown population standard deviation and population variance.
split: either a character string indicating the variable name of the split variable in data, or a vector representing the split variable. The split variable is excluded from the data frame specified in data.Note that a grouping variable can only be used when computing confidence intervals with unknown population standard deviation and population variance.
sort.var: logical: if TRUE, output table is sorted by variables when specifying group.
na.omit: logical: if TRUE, incomplete cases are removed before conducting the analysis (i.e., listwise deletion) when specifying more than one outcome variable.
digits: an integer value indicating the number of decimal places to be used.
as.na: a numeric vector indicating user-defined missing values, i.e. these values are converted to NA before conducting the analysis. Note that as.na() function is only applied to data, but not to group or split.
plot: a character string indicating the type of the plot to display, i.e., "none" (default) for not displaying any plots, "ci" for displaying confidence intervals for the arithmetic mean or median, "boot" for displaying bootstrap samples with histograms and density curves when the argument "boot" is other than "none".
point.size: a numeric value indicating the size argument in the geom_point function for controlling the size of points when plotting confidence intervals (plot = "ci").
point.shape: a numeric value between 0 and 25 or a character string as plotting symbol indicating the shape argument in the geom_point function for controlling the symbols of points when plotting confidence intervals (plot = "ci").
errorbar.width: a numeric value indicating the width argument in the geom_errorbar function for controlling the width of the whiskers in the geom_errorbar function when plotting confidence intervals (plot = "ci").
dodge.width: a numeric value indicating the width argument controlling the width of the geom elements to be dodged when specifying a grouping variable using the argument group and plotting confidence intervals (plot = "ci").
hist: logical: if TRUE (default), histograms are drawn when plotting bootstrap samples (plot = "boot").
binwidth: a numeric value or a function for specifying the binwidth argument in the geom_histogram function for controlling the width of the bins when plotting bootstrap samples (plot = "boot").
bins: a numeric value for specifying the bins argument in the geom_histogram function for controlling the number of bins when plotting bootstrap samples (plot = "boot").
hist.alpha: a numeric value between 0 and 1 for specifying the alpha argument in the geom_histogram function for controlling the opacity of the bars when plotting bootstrap samples (plot = "boot").
fill: a character string specifying the fill argument in the geom_histogram function controlling the fill aesthetic when plotting bootstrap samples (plot = "boot"). Note that this argument applied only when no grouping variable was specified group = NULL.
density: logical: if TRUE (default), density curves are drawn when plotting bootstrap samples (plot = "boot").
density.col: a character string specifying the color argument in the geom_density function controlling the color of the density curves when plotting bootstrap samples (plot = "boot"). Note that this argument applied only when no grouping variable was specified group = NULL.
density.linewidth: a numeric value specifying the linewidth argument in the geom_density function controlling the line width of the density curves when plotting bootstrap samples (plot = "boot").
density.linetype: a numeric value or character string specifying the linetype argument in the geom_density function controlling the line type of the density curves when plotting bootstrap samples (plot = "boot").
point: logical: if TRUE (default), vertical lines representing the point estimate of the arithmetic mean or median are drawn when plotting bootstrap samples (plot = "boot").
point.col: a character string specifying the color argument in the geom_vline function for controlling the color of the vertical line displaying the arithmetic mean or median when plotting bootstrap samples (plot = "boot"). Note that this argument applied only when no grouping variable was specified group = NULL.
point.linewidth: a numeric value specifying the linewdith argument in the geom_vline function for controlling the line width of the vertical line displaying the arithmetic mean or median when plotting bootstrap samples (plot = "boot").
point.linetype: a numeric value or character string specifying the linetype argument in the geom_vline function controlling the line type of the vertical line displaying the arithmetic mean or median when plotting bootstrap samples (plot = "boot").
ci: logical: if TRUE (default), vertical lines representing the bootstrap confidence intervals of the arithmetic mean or median are drawn when plotting bootstrap samples (plot = "boot").
ci.col: character string specifying the color argument in the geom_vline function for controlling the color of the vertical line displaying bootstrap confidence intervals when plotting bootstrap samples (plot = "boot"). Note that this argument applied only when no grouping variable was specified group = NULL.
ci.linewidth: a numeric value specifying the linewdith argument in the geom_vline function for controlling the line width of the vertical line displaying bootstrap confidence intervals when plotting bootstrap samples (plot = "boot").
ci.linetype: a numeric value or character string specifying the linetype argument in the geom_vline function controlling the line type of the vertical line displaying bootstrap confidence intervals when plotting bootstrap samples (plot = "boot").
line: logical: if TRUE, a horizontal line is drawn when plot = "ci" or a vertical line is drawn when plot = "boot"
intercept: a numeric value indicating the yintercept or xintercept argument in the geom_hline or geom_vline function controlling the position of the horizontal or vertical line when plot = "ci" and line = TRUE or when plot = "boot" and line = TRUE. By default, the horizontal or vertical line is drawn at 0.
linetype: a character string indicating the linetype argument in the geom_hline or geom_vline function controlling the line type of the horizontal or vertical line (default is linetype = "dashed").
line.col: a character string indicating the color argument in the geom_hline or geom_vline function for controlling the color of the horizontal or vertical line.
xlab: a character string indicating the name argument in the scale_x_continuous function for labeling the x-axis. The default setting is xlab = NULL when plot = "ci" and xlab = "Arithmetic Mean" or xlab = "Median" when plot = "boot".
ylab: a character string indicating the name argument in the scale_y_continuous function for labeling the y-axis. The default setting is ylab = "Arithmetic Mean" or ylab = "Median" when plot = "ci" and ylab = "Probability Density, f(x)" when plot = "boot".
xlim: a numeric vector with two elements indicating the limits argument in the scale_x_continuous function for controlling the scale range of the x-axis.
ylim: a numeric vector with two elements indicating the limits argument in the scale_y_continuous function for controlling the scale range of the y-axis.
xbreaks: a numeric vector indicating the breaks argument in the scale_x_continuous function for controlling the x-axis breaks. The default setting is xbreaks = NULL when plot = "ci" and xbreaks = seq(-1, 1, by = 0.25) when plot = "boot".
ybreaks: a numeric vector indicating the breaks argument in the scale_y_continuous function for controlling the y-axis breaks. The default setting is ybreaks = seq(-1, 1, by = 0.25) when plot = "ci" and ybreaks = NULL when plot = "boot".
axis.title.size: a numeric value indicating the size argument in the element_text function for specifying the function controlling the font size of the axis title, i.e., theme(axis.title = element_text(size = axis.text.size)).
axis.text.size: a numeric value indicating the size argument in the element_text function for specifying the function controlling the font size of the axis text, i.e., theme(axis.text = element_text(size = axis.text.size)).
strip.text.size: a numeric value indicating the size argument in the element_text function for specifying the function controlling the font size of the strip text, i.e., theme(strip.text = element_text(size = strip.text.size)).
title: a character string indicating the title argument in the labs function for the subtitle of the plot.
subtitle: a character string indicating the subtite argument in the labs function for the subtitle of the plot.
group.col: a character vector indicating the color argument in the scale_color_manual and scale_fill_manual functions when specifying a grouping variable using the argument group.
plot.margin: a numeric vector with four elements indicating the plot.margin argument in the theme function controlling the plot margins . The default setting is c(5.5, 5.5, 5.5, 5.5), but switches to c(5.5, 5.5, -2.5, 5.5) when specifying a grouping variable using the argument group.
legend.title: a character string indicating the color argument in the labs function for specifying the legend title when specifying a grouping variable using the argument group.
legend.position: a character string indicating the legend.position in the theme argument for controlling the position of the legend function when specifying a grouping variable using the argument group. By default, the legend is placed at the bottom the plot.
legend.box.margin: a numeric vector with four elements indicating the legend.box.margin argument in the theme function for controlling the margins around the full legend area when specifying a grouping variable using the argument group.
facet.ncol: a numeric value indicating the ncol argument in the facet_wrap function for controlling the number of columns when specifying a split variable using the argument split.
facet.nrow: a numeric value indicating the nrow argument in the facet_wrap function for controlling the number of rows when specifying a split variable using the argument split.
facet.scales: a character string indicating the scales argument in the facet_wrap function for controlling the scales shared across facets, i.e., "fixed", "free_x", "free_y", or "free" (default) when specifying a split variable using the argument split.
filename: a character string indicating the filename argument including the file extension in the ggsave function. Note that one of ".eps", ".ps", ".tex", ".pdf" (default), ".jpeg", ".tiff", ".png", ".bmp", ".svg" or ".wmf" needs to be specified as file extension in the file argument. Note that plots can only be saved when plot = "ci" or plot = "boot".
width: a numeric value indicating the width argument (default is the size of the current graphics device) in the ggsave function.
height: a numeric value indicating the height argument (default is the size of the current graphics device) in the ggsave function.
units: a character string indicating the units argument (default is in) in the ggsave function.
dpi: a numeric value indicating the dpi argument (default is 600) in the ggsave function.
write: a character string naming a file for writing the output into either a text file with file extension ".txt" (e.g., "Output.txt") or Excel file with file extension ".xlsx" (e.g., "Output.xlsx"). If the file name does not contain any file extension, an Excel file will be written.
append: logical: if TRUE (default), output will be appended to an existing text file with extension .txt specified in write, if FALSE existing text file will be overwritten.
check: logical: if TRUE (default), argument specification is checked.
output: logical: if TRUE (default), output is shown on the console.

Author

Takuya Yanagida takuya.yanagida@univie.ac.at

References

Baguley, T. S. (2012). Serious stats: A guide to advanced statistics for the behavioral sciences. Palgrave Macmillan.

Canty, A., & Ripley, B. (2024). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-31.

Rasch, D., Kubinger, K. D., & Yanagida, T. (2011). Statistics in psychology - Using R and SPSS. John Wiley & Sons.

Examples

Run this code

#----------------------------------------------------------------------------
# Confidence Interval (CI) for the Arithmetic Mean

# Example 1a: Two-Sided 95% CI
ci.mean(mtcars)

# Example 1b: Two-Sided 95% Difference-Adjusted CI
ci.mean(mtcars, adjust = TRUE)

# Example 1c: Two-Sided 95% CI with known population standard deviation
ci.mean(mtcars, mpg, sigma = 6)

# Alternative specification without using the '...' argument
ci.mean(mtcars$mpg, sigma = 6)

#----------------------------------------------------------------------------
# Confidence Interval (CI) for the Median

# Example 2a: Two-Sided 95% CI
ci.median(mtcars)

# Example 2b: One-Sided 99% CI
ci.median(mtcars, alternative = "less", conf.level = 0.99)

if (FALSE) {
#----------------------------------------------------------------------------
# Bootstrap Confidence Interval (CI)

# Example 3a: Bias-corrected (BC) percentile bootstrap CI
ci.mean(mtcars, boot = "bc")

# Example 3b: Bias-corrected and accelerated (BCa) bootstrap CI,
# 5000 bootstrap replications, set seed of the pseudo-random number generator
ci.mean(mtcars, boot = "bca", R = 5000, seed = 123)

#----------------------------------------------------------------------------
# Grouping and Split Variable

# Example 4a: Grouping variable
ci.mean(mtcars, mpg, cyl, disp, group = "vs")

# Alternative specification without using the '...' argument
ci.mean(mtcars[, c("mpg", "cyl", "disp")], group = mtcars$vs)

# Example 4b: Split variable
ci.mean(mtcars, mpg, cyl, disp, split = "am")

# Alternative specification
ci.mean(mtcars[, c("mpg", "cyl", "disp")], split = mtcars$am)

# Example 4c: Grouping and split variable
ci.mean(mtcars, mpg, cyl, disp, group = "vs", split = "am")

# Alternative specification
ci.mean(mtcars[, c("mpg", "cyl", "disp")], group = mtcars$vs, split = mtcars$am)

#----------------------------------------------------------------------------
# Write Output

# Example 5a: Text file
ci.mean(mtcars, write = "CI_Mean_Text.txt")

# Example 5b: Excel file
ci.mean(mtcars, write = "CI_Mean_Excel.xlsx")

#----------------------------------------------------------------------------
# Plot Confidence Intervals

# Example 6a: Two-Sided 95
ci.mean(mtcars, disp, hp, plot = "ci")

# Example 6b: Grouping variable
ci.mean(mtcars, disp, hp, group = "vs", plot = "ci")

# Example 6c: Split variable
ci.mean(mtcars, disp, hp, split = "am", plot = "ci")

# Example 6d: Save plot as PDF file
ci.mean(mtcars, disp, hp, plot = "ci", saveplot = "CI_Mean.pdf",
        width = 9, height = 6)

# Example 6e: Save plot as PNG file
ci.mean(mtcars, disp, hp, plot = "ci", saveplot = "CI_Mean.png",
        width = 9, height = 6)

#----------------------------------------------------------------------------
# Example 7: Plot Bootstrap Samples

# Example 7a: Two-Sided 95
ci.mean(mtcars, disp, hp, boot = "bc", plot = "boot")

# Example 7b: Grouping variable
ci.mean(mtcars, disp, hp, group = "vs", boot = "bc", plot = "boot")

# Example 7c: Split variable
ci.mean(mtcars, disp, hp, split = "am", boot = "bc", plot = "boot")

# Example 7d: Save plot as PDF file
ci.mean(mtcars, disp, hp, boot = "bc", plot = "boot", saveplot = "CI_Mean_Boot.pdf",
        width = 12, height = 7)

# Example 7e: Save plot as PNG file
ci.mean(mtcars, disp, hp, boot = "bc", plot = "boot", saveplot = "CI_Mean_Boot.png",
        width = 12, height = 7)
}

Run the code above in your browser using DataLab