dabest: Differences between Groups with Bootstrap Confidence Intervals

Description

dabest applies a summary function (func, default mean) to the groups listed in idx, which are factors/strings in the x column of .data. The first element of idx is the control group. The difference between func(group_n) and func(control) is computed, for every subsequent element of idx. For each comparison, a bootstrap confidence interval is constructed for the difference, and bias correction and acceleration is applied to correct for any skew. dabest uses bootstrap resampling to compute non-parametric assumption-free confidence intervals, and visualizes them using estimation plots with a specialized plot.dabest function.

Usage

dabest(.data, x, y, idx, paired = FALSE, id.column = NULL, ci = 95,
  reps = 5000, func = mean, seed = 12345)

Arguments

.data

A data.frame or tibble.

x, y

Columns in .data.

idx

A vector containing factors or strings in the x columns. These must be quoted (ie. surrounded by quotation marks). The first element will be the control group, so all differences will be computed for every other group and this first group.

paired

boolean, default FALSE. If TRUE, the two groups are treated as paired samples. The control_group group is treated as pre-intervention and the test_group group is considered post-intervention.

id.column,

default NULL. A column name indicating the identity of the datapoint if the data is paired. This must be supplied if paired is TRUE.

float, default 95. The level of the confidence intervals produced. The default ci = 95 produces 95% CIs.

reps

integer, default 5000. The number of bootstrap resamples that will be generated.

func

function, default mean. This function will be applied to control and test individually, and the difference will be saved as a single bootstrap resample. Any NaNs will be removed automatically with na.omit.

seed

integer, default 12345. This specifies the seed used to set the random number generator. Setting a seed ensures that the bootstrap confidence intervals for the same data will remain stable over separate runs/calls of this function. See set.seed for more details.

Value

A list with 7 elements: data, x, y, idx, id.column, result, and summary.

data, x, y, id.column, and idx are the same keywords supplied to dabest as noted above. x and y are quoted variables for tidy evaluation by plot. summary is a tibble with func applied to every group specified in idx. These will be used by plot() to generate the estimation plot.

result is a tibble with the following 15 columns:

control_group, test_group

The name of the control group and test group respectively.

control_size, test_size

The number of observations in the control group and test group respectively.

func

The func passed to bootdiff.

paired

Is the difference paired (TRUE) or not (FALSE)?

difference

The difference between the two groups; effectively func(test_group) - func(control_group).

variable

The variable whose difference is being computed, ie. the column supplied to y.

The ci passed to the bootdiff.

bca_ci_low, bca_ci_high

The lower and upper limits of the Bias Corrected and Accelerated bootstrap confidence interval.

pct_ci_low, pct_ci_high

The lower and upper limits of the percentile bootstrap confidence interval.

bootstraps

The array of bootstrap resamples generated.

Details

Estimation statistics is a statistical framework that focuses on effect sizes and confidence intervals around them, rather than P values and associated dichotomous hypothesis testing.

References

Bootstrap Confidence Intervals. DiCiccio, Thomas J., and Bradley Efron. Statistical Science: vol. 11, no. 3, 1996. pp. 189<U+2013>228.

An Introduction to the Bootstrap. Efron, Bradley, and R. J. Tibshirani. 1994. CRC Press.

Examples

Run this code

# NOT RUN {
# Performing unpaired (two independent groups) analysis.
unpaired_mean_diff <- dabest(iris, Species, Petal.Width,
                             idx = c("setosa", "versicolor"),
                             paired = FALSE)

# Display the results in a user-friendly format.
unpaired_mean_diff

# Produce an estimation plot.
plot(unpaired_mean_diff)


# Performing paired analysis.
# First, we munge the `iris` dataset so we can perform a within-subject
# comparison of sepal length vs. sepal width.

new.iris     <- iris
new.iris$ID  <- 1: length(new.iris)
setosa.only  <-
  new.iris %>%
  tidyr::gather(key = Metric, value = Value, -ID, -Species) %>%
  dplyr::filter(Species %in% c("setosa"))

paired_mean_diff          <- dabest(
                              setosa.only, Metric, Value,
                              idx = c("Sepal.Length", "Sepal.Width"),
                              paired = TRUE, id.col = ID
                              )


# Computing the median difference.
unpaired_median_diff      <- dabest(
                              iris, Species, Petal.Width,
                              idx = c("setosa", "versicolor", "virginica"),
                              paired = FALSE,
                              func = median
                              )


# Producing a 90% CI instead of 95%.
unpaired_mean_diff_90_ci  <- dabest(
                              iris, Species, Petal.Width,
                              idx = c("setosa", "versicolor", "virginica"),
                              paired = FALSE,
                              ci = 0.90
                              )



# Using pipes to munge your data and then passing to `dabest`.
# First, we generate some synthetic data.
set.seed(12345)
N        <- 70
c         <- rnorm(N, mean = 50, sd = 20)
t1        <- rnorm(N, mean = 200, sd = 20)
t2        <- rnorm(N, mean = 100, sd = 70)
long.data <- tibble::tibble(Control = c, Test1 = t1, Test2 = t2)

# Munge the data using `gather`, then pass it directly to `dabest`

meandiff <- long.data %>%
              tidyr::gather(key = Group, value = Measurement) %>%
              dabest(x = Group, y = Measurement,
                     idx = c("Control", "Test1", "Test2"),
                     paired = FALSE)



# }

Run the code above in your browser using DataLab