Learn R Programming

misty (version 0.7.0)

ci.prop: (Bootstrap) Confidence Intervals for Proportions

Description

This function computes and plots confidence intervals for proportions, optionally by a grouping and/or split variable. The function also supports three types of bootstrap confidence intervals (e.g., bias-corrected (BC) percentile bootstrap or bias-corrected and accelerated (BCa) bootstrap confidence intervals) and plots the bootstrap samples with histograms and density curves.

Usage

ci.prop(..., data = NULL, method = c("wald", "wilson"),
        boot = c("none", "perc", "bc", "bca"), R = 1000, seed = NULL,
        alternative = c("two.sided", "less", "greater"), conf.level = 0.95,
        group = NULL, split = NULL, sort.var = FALSE, na.omit = FALSE,
        digits = 3, as.na = NULL, plot = c("none", "ci", "boot"),
        point.size = 2.5, point.shape = 19, errorbar.width = 0.3, dodge.width = 0.5,
        hist = TRUE, binwidth = NULL, bins = NULL, alpha = 0.4, fill = "gray85",
        density = TRUE, density.col = "#0072B2", density.linewidth = 0.5,
        density.linetype = "solid", plot.point = TRUE, point.col = "#CC79A7",
        point.linewidth = 0.6, point.linetype = "solid", plot.ci = TRUE,
        ci.col = "black", ci.linewidth = 0.6, ci.linetype = "dashed", line = FALSE,
        intercept = 0.5, linetype = "solid", line.col = "gray65", xlab = NULL,
        ylab = NULL, xlim = NULL, ylim = NULL, xbreaks = ggplot2::waiver(),
        ybreaks = ggplot2::waiver(), axis.title.size = 11, axis.text.size = 10,
        strip.text.size = 11, title = NULL, subtitle = NULL, group.col = NULL,
        plot.margin = NA, legend.title = "",
        legend.position = c("right", "top", "left", "bottom", "none"),
        legend.box.margin = c(-10, 0, 0, 0), facet.ncol = NULL, facet.nrow = NULL,
        facet.scales = "free_y", saveplot = NULL, width = NA, height = NA,
        units = c("in", "cm", "mm", "px"), dpi = 600, write = NULL, append = TRUE,
        check = TRUE, output = TRUE)

Value

Returns an object of class misty.object, which is a list with following entries:

call

function call

type

type of analysis

data

list with the input specified in ..., data, group, and split

args

specification of function arguments

boot

data frame with bootstrap replicates of the aproportion when bootstrapping was requested

plot

ggplot2 object for plotting the results and the data frame used for plotting

result

result table

Arguments

...

a numeric vector, matrix or data frame with numeric variables with 0 and 1 values, i.e., factors and character variables are excluded from x before conducting the analysis. Alternatively, an expression indicating the variable names in data e.g., ci.prop(x1, x2, x3, data = dat). Note that the operators ., +, -, ~, :, ::, and ! can also be used to select variables, see 'Details' in the df.subset function.

data

a data frame when specifying one or more variables in the argument .... Note that the argument is NULL when specifying a numeric vector, matrix or data frame for the argument ....

method

a character string specifying the method for computing the confidence interval, must be one of "wald", or "wilson" (default).

boot

a character string specifying the type of bootstrap confidence intervals (CI), i.e., "none" (default) for not conducting bootstrapping, "perc", for the percentile bootstrap CI "bc" (default) for the bias-corrected (BC) percentile bootstrap CI (without acceleration), and "bca" for the bias-corrected and accelerated (BCa) bootstrap CI, see 'Details' in the ci.cor function.

R

a numeric value indicating the number of bootstrap replicates (default is 1000).

seed

a numeric value specifying seeds of the pseudo-random numbers used in the bootstrap algorithm when conducting bootstrapping.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".

conf.level

a numeric value between 0 and 1 indicating the confidence level of the interval.

group

either a character string indicating the variable name of the grouping variable in ... or data, or a vector representing the grouping variable.

split

either a character string indicating the variable name of the split variable in ... or data, or a vector representing the split variable.

sort.var

logical: if TRUE, output table is sorted by variables when specifying group.

na.omit

logical: if TRUE, incomplete cases are removed before conducting the analysis (i.e., listwise deletion) when specifying more than one outcome variable.

digits

an integer value indicating the number of decimal places to be used.

as.na

a numeric vector indicating user-defined missing values, i.e. these values are converted to NA before conducting the analysis. Note that as.na() function is only applied to x, but not to group or split.

plot

a character string indicating the type of the plot to display, i.e., "none" (default) for not displaying any plots, "ci" for displaying confidence intervals for the proportion, "boot" for displaying bootstrap samples with histograms and density curves when the argument "boot" is other than "none".

point.size

a numeric value indicating the size argument in the geom_point function for controlling the size of points when plotting confidence intervals (plot = "ci").

point.shape

a numeric value between 0 and 25 or a character string as plotting symbol indicating the shape argument in the geom_point function for controlling the symbols of points when plotting confidence intervals (plot = "ci").

errorbar.width

a numeric value indicating the width argument in the geom_errorbar function for controlling the width of the whiskers in the geom_errorbar function when plotting confidence intervals (plot = "ci").

dodge.width

a numeric value indicating the width argument controlling the width of the geom elements to be dodged when specifying a grouping variable using the argument group when plotting confidence intervals (plot = "ci").

hist

logical: if TRUE (default), histograms are drawn when plotting bootstrap samples (plot = "boot").

binwidth

a numeric value or a function for specifying the binwidth argument in the geom_histogram function for controlling the width of the bins when plotting bootstrap samples (plot = "boot").

bins

a numeric value for specifying the bins argument in the geom_histogram function for controlling the number of bins when plotting bootstrap samples (plot = "boot").

alpha

a numeric value between 0 and 1 for specifying the alpha argument in the geom_histogram function for controlling the opacity of the bars when plotting bootstrap samples (plot = "boot").

fill

a character string specifying the fill argument in the geom_histogram function controlling the fill aesthetic when plotting bootstrap samples (plot = "boot"). Note that this argument applied only when no grouping variable was specified group = NULL.

density

logical: if TRUE (default), density curves are drawn when plotting bootstrap samples (plot = "boot").

density.col

a character string specifying the color argument in the geom_density function controlling the color of the density curves when plotting bootstrap samples (plot = "boot"). Note that this argument applied only when no grouping variable was specified group = NULL.

density.linewidth

a numeric value specifying the linewidth argument in the geom_density function controlling the line width of the density curves when plotting bootstrap samples (plot = "boot").

density.linetype

a numeric value or character string specifying the linetype argument in the geom_density function controlling the line type of the density curves when plotting bootstrap samples (plot = "boot").

plot.point

logical: if TRUE (default), vertical lines representing the point estimate of the proportion are drawn when plotting bootstrap samples (plot = "boot").

point.col

a character string specifying the color argument in the geom_vline function for controlling the color of the vertical line displaying the proportion when plotting bootstrap samples (plot = "boot"). Note that this argument applied only when no grouping variable was specified group = NULL.

point.linewidth

a numeric value specifying the linewdith argument in the geom_vline function for controlling the line width of the vertical line displaying proportions when plotting bootstrap samples (plot = "boot").

point.linetype

a numeric value or character string specifying the linetype argument in the geom_vline function controlling the line type of the vertical line displaying proportions when plotting bootstrap samples (plot = "boot").

plot.ci

logical: if TRUE (default), vertical lines representing the bootstrap confidence intervals of proportions are drawn when plotting bootstrap samples (plot = "boot").

ci.col

character string specifying the color argument in the geom_vline function for controlling the color of the vertical line displaying bootstrap confidence intervals when plotting bootstrap samples (plot = "boot"). Note that this argument applied only when no grouping variable was specified group = NULL.

ci.linewidth

a numeric value specifying the linewdith argument in the geom_vline function for controlling the line width of the vertical line displaying bootstrap confidence intervals when plotting bootstrap samples (plot = "boot").

ci.linetype

a numeric value or character string specifying the linetype argument in the geom_vline function controlling the line type of the vertical line displaying bootstrap confidence intervals when plotting bootstrap samples (plot = "boot").

line

logical: if TRUE, a horizontal line is drawn when plot = "ci" or a vertical line is drawn when plot = "boot"

intercept

a numeric value indicating the yintercept or xintercept argument in the geom_hline or geom_vline function controlling the position of the horizontal or vertical line when plot = "ci" and line = TRUE or when plot = "boot" and line = TRUE. By default, the horizontal or vertical line is drawn at 0.

linetype

a character string indicating the linetype argument in the geom_hline or geom_vline function controlling the line type of the horizontal or vertical line (default is linetype = "dashed").

line.col

a character string indicating the color argument in the geom_hline or geom_vline function for controlling the color of the horizontal or vertical line.

xlab

a character string indicating the name argument in the scale_x_continuous function for labeling the x-axis. The default setting is xlab = NULL when plot = "ci" and xlab = "Proportion" when plot = "boot".

ylab

a character string indicating the name argument in the scale_y_continuous function for labeling the y-axis. The default setting is ylab = "Proportion" when plot = "ci" and ylab = "Probability Density, f(x)" when plot = "boot".

xlim

a numeric vector with two elements indicating the limits argument in the scale_x_continuous function for controlling the scale range of the x-axis. The default setting is xlim = NULL when plot = "ci" and xlim = c(0, 1) when plot = "boot".

ylim

a numeric vector with two elements indicating the limits argument in the scale_y_continuous function for controlling the scale range of the y-axis. The default setting is ylim = c(0, 1) when plot = "ci" and xlim = NULL when plot = "boot".

xbreaks

a numeric vector indicating the breaks argument in the scale_x_continuous function for controlling the x-axis breaks. The default setting is xbreaks = NULL when plot = "ci" and xbreaks = seq(-1, 1, by = 0.25) when plot = "boot".

ybreaks

a numeric vector indicating the breaks argument in the scale_y_continuous function for controlling the y-axis breaks. The default setting is ybreaks = seq(-1, 1, by = 0.25) when plot = "ci" and ybreaks = NULL when plot = "boot".

axis.title.size

a numeric value indicating the size argument in the element_text function for specifying the function controlling the font size of the axis title, i.e., theme(axis.title = element_text(size = axis.text.size)).

axis.text.size

a numeric value indicating the size argument in the element_text function for specifying the function controlling the font size of the axis text, i.e., theme(axis.text = element_text(size = axis.text.size)).

strip.text.size

a numeric value indicating the size argument in the element_text function for specifying the function controlling the font size of the strip text, i.e., theme(strip.text = element_text(size = strip.text.size)).

title

a character string indicating the title argument in the labs function for the subtitle of the plot.

subtitle

a character string indicating the subtite argument in the labs function for the subtitle of the plot.

group.col

a character vector indicating the color argument in the scale_color_manual and scale_fill_manual functions when specifying a grouping variable using the argument group.

plot.margin

a numeric vector with four elements indicating the plot.margin argument in the theme function controlling the plot margins . The default setting is c(5.5, 5.5, 5.5, 5.5), but switches to c(5.5, 5.5, -2.5, 5.5) when specifying a grouping variable using the argument group.

legend.title

a character string indicating the color argument in the labs function for specifying the legend title when specifying a grouping variable using the argument group.

legend.position

a character string indicating the legend.position in the theme argument for controlling the position of the legend function when specifying a grouping variable using the argument group. By default, the legend is placed at the bottom the plot.

legend.box.margin

a numeric vector with four elements indicating the legend.box.margin argument in the theme function for controlling the margins around the full legend area when specifying a grouping variable using the argument group.

facet.ncol

a numeric value indicating the ncol argument in the facet_wrap function for controlling the number of columns when specifying a split variable using the argument split.

facet.nrow

a numeric value indicating the nrow argument in the facet_wrap function for controlling the number of rows when specifying a split variable using the argument split.

facet.scales

a character string indicating the scales argument in the facet_wrap function for controlling the scales shared across facets, i.e., "fixed", "free_x", "free_y", or "free" (default) when specifying a split variable using the argument split.

saveplot

a character string indicating the filename argument including the file extension in the ggsave function. Note that one of ".eps", ".ps", ".tex", ".pdf" (default), ".jpeg", ".tiff", ".png", ".bmp", ".svg" or ".wmf" needs to be specified as file extension in the file argument. Note that plots can only be saved when plot = "ci" or plot = "boot".

width

a numeric value indicating the width argument (default is the size of the current graphics device) in the ggsave function.

height

a numeric value indicating the height argument (default is the size of the current graphics device) in the ggsave function.

units

a character string indicating the units argument (default is in) in the ggsave function.

dpi

a numeric value indicating the dpi argument (default is 600) in the ggsave function.

write

a character string naming a file for writing the output into either a text file with file extension ".txt" (e.g., "Output.txt") or Excel file with file extension ".xlsx" (e.g., "Output.xlsx"). If the file name does not contain any file extension, an Excel file will be written.

append

logical: if TRUE (default), output will be appended to an existing text file with extension .txt specified in write, if FALSE existing text file will be overwritten.

check

logical: if TRUE (default), argument specification is checked.

output

logical: if TRUE (default), output is shown on the console.

Author

Takuya Yanagida takuya.yanagida@univie.ac.at

Details

The Wald confidence interval which is based on the normal approximation to the binomial distribution are computed by specifying method = "wald", while the Wilson (1927) confidence interval (aka Wilson score interval) is requested by specifying method = "wilson". By default, Wilson confidence interval is computed which have been shown to be reliable in small samples of n = 40 or less, and larger samples of n > 40 (Brown, Cai & DasGupta, 2001), while the Wald confidence intervals is inadequate in small samples and when p is near 0 or 1 (Agresti & Coull, 1998).

References

Agresti, A. & Coull, B.A. (1998). Approximate is better than "exact" for interval estimation of binomial proportions. American Statistician, 52, 119-126.

Brown, L. D., Cai, T. T., & DasGupta, A., (2001). Interval estimation for a binomial proportion. Statistical Science, 16, 101-133.

Canty, A., & Ripley, B. (2024). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-31.

Rasch, D., Kubinger, K. D., & Yanagida, T. (2011). Statistics in psychology - Using R and SPSS. John Wiley & Sons.

Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22, 209-212.

See Also

ci.prop, ci.prop.diff, ci.median, ci.prop.diff, ci.cor, ci.var, ci.sd, descript

Examples

Run this code
#----------------------------------------------------------------------------
# Confidence Interval (CI) forproportions

# Example 1a: Two-Sided 95% CI
ci.prop(mtcars[, c("vs", "am")])

# Alternative specification
ci.prop(vs, am, data = mtcars)

# Example 1b: One-Sided 95% CI using Wald method
ci.prop(mtcars[, c("vs", "am")], method = "wald", alternative = "less")

# Alternative specification
ci.prop(vs, am, data = mtcars, method = "wald", alternative = "less")

if (FALSE) {
#----------------------------------------------------------------------------
# Bootstrap Confidence Interval (CI)

# Example 2a: Bias-corrected (BC) percentile bootstrap CI
ci.prop(mtcars[, c("vs", "am")], boot = "bc")

# Example 2b: Bias-corrected and accelerated (BCa) bootstrap CI,
# 5000 bootstrap replications, set seed of the pseudo-random number generator
ci.prop(mtcars[, c("vs", "am")], boot = "bca", R = 5000, seed = 123)

#----------------------------------------------------------------------------
# Grouping and Split Variable

# Example 3a: Grouping variable
ci.prop(vs, data = mtcars, group = "am")

# Alternative specification
ci.prop(mtcars[, "vs"], group = mtcars$am)

# Example 3b: Split variable
ci.prop(vs, data = mtcars, split = "am")

# Alternative specification
ci.prop(mtcars[, "vs"], split = mtcars$am)

# Example 3c: Grouping and split variable
ci.prop(vs, data = mtcars, group = "am", split = "cyl")

# Alternative specification
ci.prop(mtcars$vs,  group = mtcars$am, split = mtcars$cyl)

#----------------------------------------------------------------------------
# Write Output

# Example 4a: Text file
ci.prop(mtcars[, c("vs", "am")], write = "CI_Prop_Text.txt")

# Example 4b: Excel file
ci.prop(mtcars[, c("vs", "am")], write = "CI_Prop_Excel.xlsx")

#----------------------------------------------------------------------------
# Plot Confidence Intervals

# Example 5a: Two-Sided 95% CI
ci.prop(vs, am, data = mtcars, plot = "ci")

# Example 5b: Grouping variable
ci.prop(vs, am, data = mtcars, group = "am", plot = "ci")

# Example 5c: Split variable
ci.prop(vs, am, data = mtcars, split = "am", plot = "ci")

# Example 5d: Save plot as PDF file
ci.prop(vs, am, data = mtcars, plot = "ci", saveplot = "CI_Prop.pdf",
        width = 9, height = 6)

# Example 5e: Save plot as PNG file
ci.prop(vs, am, data = mtcars, plot = "ci", saveplot = "CI_Prop.png",
        width = 9, height = 6)

#----------------------------------------------------------------------------
# Plot Bootstrap Samples

# Example 6a: Two-Sided 95% CI
ci.prop(vs, am, data = mtcars, boot = "bc", plot = "boot")

# Example 6b: Grouping variable
ci.prop(vs, am, data = mtcars, group = "am", boot = "bc", plot = "boot")

# Example 6c: Split variable
ci.prop(vs, am, data = mtcars, split = "am", boot = "bc", plot = "boot")

# Example 6d: Save plot as PDF file
ci.prop(vs, am, data = mtcars, boot = "bc", plot = "boot",
        saveplot = "CI_Prop_Boot.pdf", width = 9, height = 6)

# Example 6e: Save plot as PNG file
ci.prop(vs, am, data = mtcars, boot = "bc", plot = "boot",
        saveplot = "CI_Prop_Boot.png", width = 9, height = 6)
}

Run the code above in your browser using DataLab