ci.cor: (Bootstrap) Confidence Intervals for Correlation Coefficients

Description

This function computes and plots (1) Fisher $z'$ confidence intervals for Pearson product-moment correlation coefficients (a) without non-normality adjustment, (1b) adjusted via sample joint moments method or (1c) adjusted via approximate distribution method (Bishara et al., 2018), (2) Spearman's rank-order correlation coefficients with (2a) Fieller et al. (1957) standard error, (2b) Bonett and Wright (2000) standard error, or (2c) rank-based inverse normal transformation, (3) Kendall's Tau-b, and (4) Kendall-Stuart's Tau-c correlation coefficients with Fieller et al. (1957) standard error, optionally by a grouping and/or split variable. The function also supports five types of bootstrap confidence intervals (e.g., bias-corrected (BC) percentile bootstrap or bias-corrected and accelerated (BCa) bootstrap confidence intervals) and plots the bootstrap samples with histograms and density curves. By default, the function computes Pearson product-moment correlation coefficients adjusted via approximate distribution method.

Usage

ci.cor(..., data = NULL,
       method = c("pearson", "spearman", "kendall-b", "kendall-c"),
       adjust = c("none", "joint", "approx"),
       se = c("fisher", "fieller", "bonett", "rin"),
       sample = TRUE, seed = NULL, maxtol = 1e-05, nudge = 0.001,
       boot = c("none", "norm", "basic", "perc", "bc", "bca"), R = 1000,
       fisher = TRUE, alternative = c("two.sided", "less", "greater"),
       conf.level = 0.95, group = NULL, split = NULL, na.omit = FALSE,
       digits = 2, as.na = NULL, plot = c("none", "ci", "boot"),
       point.size = 2.5, point.shape = 19, errorbar.width = 0.3, dodge.width = 0.5,
       hist = TRUE, binwidth = NULL, bins = NULL, alpha = 0.4, fill = "gray85",
       density = TRUE, density.col = "#0072B2", density.linewidth = 0.5,
       density.linetype = "solid", plot.point = TRUE, point.col = "#CC79A7",
       point.linewidth = 0.6, point.linetype = "solid", plot.ci = TRUE,
       ci.col = "black", ci.linewidth = 0.6, ci.linetype = "dashed", line = TRUE,
       intercept = 0, linetype = "solid", line.col = "gray65", xlab = NULL,
       ylab = NULL, xlim = NULL, ylim = NULL, xbreaks = ggplot2::waiver(),
       ybreaks = ggplot2::waiver(), axis.title.size = 11, axis.text.size = 10,
       strip.text.size = 11, title = NULL, subtitle = NULL, group.col = NULL,
       plot.margin = NA, legend.title = "",
       legend.position = c("right", "top", "left", "bottom", "none"),
       legend.box.margin = c(-10, 0, 0, 0), facet.ncol = NULL, facet.nrow = NULL,
       facet.scales = "free_y", saveplot = NULL, width = NA, height = NA,
       units = c("in", "cm", "mm", "px"), dpi = 600, write = NULL, append = TRUE,
       check = TRUE, output = TRUE)

Value

Returns an object of class misty.object, which is a list with following entries:

call: function call
type: type of analysis
data: list with the input specified in ..., data, group, and split
args: specification of function arguments
boot: data frame with bootstrap replicates of the correlation coefficient when bootstrapping was requested
plot: ggplot2 object for plotting the results and the data frame used for plotting
result: result table

Arguments

...: a matrix or data frame with numeric variables, i.e., factors and character variables are excluded from x before conducting the analysis. Alternatively, an expression indicating the variable names in data e.g., ci.cor(x1, x2, data = dat). Note that the operators ., +, -, ~, :, ::, and ! can also be used to select variables, see 'Details' in the df.subset function.
data: a data frame when specifying one or more variables in the argument .... Note that the argument is NULL when specifying a numeric vector, matrix or data frame for the argument ....
method: a character string indicating which correlation coefficient is to be computed, i.e., "pearson" for Pearson product-moment correlation coefficient (default), "spearman" for Spearman's rank-order correlation coefficient, "kendall-b" for Kendall's Tau-b correlation coefficient, "kendall-c" for Kendall-Stuart's Tau-c correlation coefficient. Note that confidence intervals are only computed given at least 4 pairs of observations.
adjust: a character string specifying the non-normality adjustment method, i.e., "none" for the Fisher $z'$ confidence interval for the Pearson product-moment correlation coefficient without non-normality adjustment, "joint" for the confidence interval with non-normality adjustment via sample joint moments, and "approx" (default) for the confidence interval with non-normality adjustment via approximate distribution by skewness and kurtosis. Note that this argument only applies to the Pearson product-moment correlation coefficient, i.e., method = "pearson"
se: a character string specifying the method for computing the standard error of the correlation coefficient, i.e., "fisher" for the Fisher $z'$ confidence interval, "fieller" (default) for the confidence interval for Spearman's rank-order correlation coefficient based on approximate standard error by Fieller et al. (1957), "bonett" for the confidence interval based on approximate standard error by Bonett and Wright (2000), and "rin" for the confidence interval for Spearman's rank-order correlation coefficient based on rank-based inverse normal (RIN) transformation. Note that this argument only applies to Spearman's rank-order correlation coefficient, i.e., method = "spearman".
sample: logical: if TRUE (default), the univariate sample skewness and kurtosis is used when applying the approximate distribution method and reported in the result table, while the population skewness and kurtosis is used when sample = FALSE.
seed: a numeric value specifying the seed of the pseudo-random number generator when generating a random set of starting parameter value when the parameters led to a sum of squares greater than the maximum tolerance after optimization when applying the approximate distribution method (adjust = approx) when computing the confidence interval for the Pearson product-moment correlation coefficient, or the seeds of the pseudo-random numbers used when conducting bootstrapping.
maxtol: a numeric value indicating the tolerance for total squared error when applying the approximate distribution method (adjust = approx).
nudge: a numeric value indicating the nudge proportion of their original values by which sample skewness, kurtosis, and r are nudged towards 0 when applying the approximate distribution method (adjust = approx). are only computed given at least 10 pairs of observations.
boot: a character string specifying the type of bootstrap confidence intervals (CI), i.e., "none" (default) for not conducting bootstrapping, "norm" for the bias-corrected normal approximation bootstrap CI, "basic" for the basic bootstrap CI, "perc", for the percentile bootstrap CI "bc" (default) for the bias-corrected (BC) percentile bootstrap CI (without acceleration), and "bca" for the bias-corrected and accelerated (BCa) bootstrap CI.
R: a numeric value indicating the number of bootstrap replicates (default is 1000).
fisher: logical: if TRUE (default), Fisher $z$ transformation is applied before computing the confidence intervals to reverse-transformed the limits of the interval using the inverse of the Fisher $z$ transformation. Note that this argument applies only when boot is "norm" or "basic".
alternative: a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".
conf.level: a numeric value between 0 and 1 indicating the confidence level of the interval.
group: either a character string indicating the variable name of the grouping variable in ... or data, or a vector representing the grouping variable. The grouping variable is excluded from the matrix or data frame specified in ... or data.
split: either a character string indicating the variable name of the split variable in ... or data, or a vector representing the split variable. The split variable is excluded from the matrix or data frame specified in ... or data.
na.omit: logical: if TRUE, incomplete cases are removed before conducting the analysis (i.e., listwise deletion); if FALSE (default), pairwise deletion is used.
digits: an integer value indicating the number of decimal places to be used.
as.na: a numeric vector indicating user-defined missing values, i.e. these values are converted to NA before conducting the analysis.
plot: a character string indicating the type of the plot to display, i.e., "none" (default) for not displaying any plots, "ci" for displaying confidence intervals for the correlation coefficient, "boot" for displaying bootstrap samples with histograms and density curves when the argument "boot" is other than "none".
point.size: a numeric value indicating the size argument in the geom_point function for controlling the size of points when plotting confidence intervals (plot = "ci").
point.shape: a numeric value between 0 and 25 or a character string as plotting symbol indicating the shape argument in the geom_point function for controlling the symbols of points. when plotting confidence intervals (plot = "ci").
errorbar.width: a numeric value indicating the width argument in the geom_errorbar function for controlling the width of the whiskers in the geom_errorbar function when plotting confidence intervals (plot = "ci").
dodge.width: a numeric value indicating the width argument controlling the width of the geom elements to be dodged when specifying a grouping variable using the argument group when plotting confidence intervals (plot = "ci").
hist: logical: if TRUE (default), histograms are drawn when plotting bootstrap samples (plot = "boot").
binwidth: a numeric value or a function for specifying the binwidth argument in the geom_histogram function for controlling the width of the bins when plotting bootstrap samples (plot = "boot").
bins: a numeric value for specifying the bins argument in the geom_histogram function for controlling the number of bins when plotting bootstrap samples (plot = "boot").
alpha: a numeric value between 0 and 1 for specifying the alpha argument in the geom_histogram function for controlling the opacity of the bars when plotting bootstrap samples (plot = "boot").
fill: a character string specifying the fill argument in the geom_histogram function controlling the fill aesthetic when plotting bootstrap samples (plot = "boot"). Note that this argument applied only when no grouping variable was specified group = NULL.
density: logical: if TRUE (default), density curves are drawn when plotting bootstrap samples (plot = "boot").
density.col: a character string specifying the color argument in the geom_density function controlling the color of the density curves when plotting bootstrap samples (plot = "boot"). Note that this argument applied only when no grouping variable was specified group = NULL.
density.linewidth: a numeric value specifying the linewidth argument in the geom_density function controlling the line width of the density curves when plotting bootstrap samples (plot = "boot").
density.linetype: a numeric value or character string specifying the linetype argument in the geom_density function controlling the line type of the density curves when plotting bootstrap samples (plot = "boot").
plot.point: logical: if TRUE (default), vertical lines representing the point estimate of the correlation coefficients are drawn when plotting bootstrap samples (plot = "boot").
point.col: a character string specifying the color argument in the geom_vline function for controlling the color of the vertical line displaying the correlation coefficient when plotting bootstrap samples (plot = "boot"). Note that this argument applied only when no grouping variable was specified group = NULL.
point.linewidth: a numeric value specifying the linewdith argument in the geom_vline function for controlling the line width of the vertical line displaying the correlation coefficient when plotting bootstrap samples (plot = "boot").
point.linetype: a numeric value or character string specifying the linetype argument in the geom_vline function controlling the line type of the vertical line displaying the correlation coefficient when plotting bootstrap samples (plot = "boot").
plot.ci: logical: if TRUE (default), vertical lines representing the bootstrap confidence intervals of the correlation coefficient are drawn when plotting bootstrap samples (plot = "boot").
ci.col: character string specifying the color argument in the geom_vline function for controlling the color of the vertical line displaying bootstrap confidence intervals when plotting bootstrap samples (plot = "boot"). Note that this argument applied only when no grouping variable was specified group = NULL.
ci.linewidth: a numeric value specifying the linewdith argument in the geom_vline function for controlling the line width of the vertical line displaying bootstrap confidence intervals when plotting bootstrap samples (plot = "boot").
ci.linetype: a numeric value or character string specifying the linetype argument in the geom_vline function controlling the line type of the vertical line displaying bootstrap confidence intervals when plotting bootstrap samples (plot = "boot").
line: logical: if TRUE (default), a horizontal line is drawn when plot = "ci" or a vertical line is drawn when plot = "boot"
intercept: a numeric value indicating the yintercept or xintercept argument in the geom_hline or geom_vline function controlling the position of the horizontal or vertical line when plot = "ci" and line = TRUE or when plot = "boot" and line = TRUE. By default, the horizontal or vertical line is drawn at 0.
linetype: a character string indicating the linetype argument in the geom_hline or geom_vline function controlling the line type of the horizontal or vertical line (default is linetype = "dashed").
line.col: a character string indicating the color argument in the geom_hline or geom_vline function for controlling the color of the horizontal or vertical line.
xlab: a character string indicating the name argument in the scale_x_continuous function for labeling the x-axis. The default setting is xlab = NULL when plot = "ci" and xlab = "Correlation Coefficient" when plot = "boot".
ylab: a character string indicating the name argument in the scale_y_continuous function for labeling the y-axis. The default setting is ylab = "Correlation Coefficient" when plot = "ci" and ylab = "Probability Density, f(x)" when plot = "boot".
xlim: a numeric vector with two elements indicating the limits argument in the scale_x_continuous function for controlling the scale range of the x-axis. The default setting is xlim = NULL when plot = "ci" and xlim = c(-1, 1) when plot = "boot".
ylim: a numeric vector with two elements indicating the limits argument in the scale_y_continuous function for controlling the scale range of the y-axis. The default setting is ylim = c(-1, 1) when plot = "ci" and xlim = NULL when plot = "boot".
xbreaks: a numeric vector indicating the breaks argument in the scale_x_continuous function for controlling the x-axis breaks.
ybreaks: a numeric vector indicating the breaks argument in the scale_y_continuous function for controlling the y-axis breaks.
axis.title.size: a numeric value indicating the size argument in the element_text function for specifying the function controlling the font size of the axis title, i.e., theme(axis.title = element_text(size = axis.text.size)).
axis.text.size: a numeric value indicating the size argument in the element_text function for specifying the function controlling the font size of the axis text, i.e., theme(axis.text = element_text(size = axis.text.size)).
strip.text.size: a numeric value indicating the size argument in the element_text function for specifying the function controlling the font size of the strip text, i.e., theme(strip.text = element_text(size = strip.text.size)).
title: a character string indicating the title argument in the labs function for the subtitle of the plot.
subtitle: a character string indicating the subtite argument in the labs function for the subtitle of the plot.
group.col: a character vector indicating the color argument in the scale_color_manual and scale_fill_manual functions when specifying a grouping variable using the argument group.
plot.margin: a numeric vector with four elements indicating the plot.margin argument in the theme function controlling the plot margins . The default setting is c(5.5, 5.5, 5.5, 5.5), but switches to c(5.5, 5.5, -2.5, 5.5) when specifying a grouping variable using the argument group.
legend.title: a character string indicating the color argument in the labs function for specifying the legend title when specifying a grouping variable using the argument group.
legend.position: a character string indicating the legend.position in the theme argument for controlling the position of the legend function when specifying a grouping variable using the argument group. By default, the legend is placed at the bottom the plot.
legend.box.margin: a numeric vector with four elements indicating the legend.box.margin argument in the theme function for controlling the margins around the full legend area when specifying a grouping variable using the argument group.
facet.ncol: a numeric value indicating the ncol argument in the facet_wrap function for controlling the number of columns when specifying a split variable using the argument split.
facet.nrow: a numeric value indicating the nrow argument in the facet_wrap function for controlling the number of rows when specifying a split variable using the argument split.
facet.scales: a character string indicating the scales argument in the facet_wrap function for controlling the scales shared across facets, i.e., "fixed", "free_x", "free_y" (default), or "free" when specifying a split variable using the argument split.
saveplot: a character string indicating the filename argument including the file extension in the ggsave function. Note that one of ".eps", ".ps", ".tex", ".pdf" (default), ".jpeg", ".tiff", ".png", ".bmp", ".svg" or ".wmf" needs to be specified as file extension in the file argument. Note that plots can only be saved when plot = "ci" or plot = "boot".
width: a numeric value indicating the width argument (default is the size of the current graphics device) in the ggsave function.
height: a numeric value indicating the height argument (default is the size of the current graphics device) in the ggsave function.
units: a character string indicating the units argument (default is in) in the ggsave function.
dpi: a numeric value indicating the dpi argument (default is 600) in the ggsave function.
write: a character string naming a file for writing the output into either a text file with file extension ".txt" (e.g., "Output.txt") or Excel file with file extension ".xlsx" (e.g., "Output.xlsx"). If the file name does not contain any file extension, an Excel file will be written.
append: logical: if TRUE (default), output will be appended to an existing text file with extension .txt specified in write, if FALSE existing text file will be overwritten.
check: logical: if TRUE (default), argument specification is checked.
output: logical: if TRUE (default), output is shown on the console.

Author

Takuya Yanagida takuya.yanagida@univie.ac.at

Details

Pearson Product-Moment Correlation Coefficient

The Fisher $z'$ confidence interval method for the Pearson product-moment correlation coefficient is based on the assumption that $X$ and $Y$ have a bivariate normal distribution in the population. Non-normality resulting from either high kurtosis or high absolute skewness can distort the Fisher $z'$ confidence interval that produces a coverage rate that does not equal the one intended. The distortion is largest when population correlation is large and both variables $X$ and $Y$ were non-normal (Bishara et al., 2017). Note that increasing sample size improves coverage only when the population correlation is zero, while increasing the sample size worsens coverage with a non-zero population correlation (Bishara & Hittner, 2017). The ci.cor function computes the Fisher $z'$ confidence interval without non-normality adjustment (adjust = "none"), with non-normality adjustment via sample joint moments (adjust = "joint"), or with non-normality adjustment via approximate distribution (adjust = "approx"):

Fisher $z'$ confidence interval method uses the $r$-to$z'$ transformation for the correlation coefficient $r$:

$$z' = 0.5\cdot \ln\left(\frac{1 + r}{1 - r}\right)$$

The sampling distribution of $z$ is approximately normal with a standard error of approximately

$$\sigma_z' = \sqrt{\frac{1}{(n - 3)}}$$

The two-sided 95% confidence interval is defined as

$$z' \pm 1.96\cdot\sigma_{z'}$$

These confidence interval bounds are transformed back to the scale of $r$:

$$r = \frac{exp(2z') - 1}{exp(2z') + 1}$$

The resulting confidence interval of the correlation coefficient is an approximation and is only accurate when $X$ and $Y$ have a bivariate normal distribution in the population or when the population correlation is zero.
The Joint Moments Method multiplies the asymptotic variance of $z'$ by $\tau_f^2$ (Hawkins, 1989):

$$\tau_f^2 = \frac{(\mu_{40} + 2\mu_{22} + \mu_{04})\rho^2 - 4(\mu_{31} + \mu_{13})\rho + 4\mu_{22})}{4(1 - \rho^2)^2}$$

where $\mu_{jk}$ represents a population joint moment defined as

$$\mu_{jk} = E[X^jY^k]$$

where $X$ and $Y$ are assumed to be standardized ($\mu_{10} = \mu_{01} = 0$, $\mu_{20} = \mu_{02} = 1$). The standard error of $z$' can then be approximated as $\tilde{\sigma}_{z'}$:

$$\tilde{\sigma}_{z'} = \tau_f\sqrt{\frac{1}{n - 3}}$$

The corresponding sample moments, $m_{jk}$ can be used to estimate $\tau_f^2$:

$$\hat{\mu}_{jk} = m_{jk} = \frac{1}{n}\sum_{i=1}^{n}(x_i^jy_i^k)$$

However, the higher-order sample joint moments may be unstable estimators of their population counterparts unless the sample size is extremely large. Thus, this estimate of $\tau_f^2$ may be inaccurate, leading to inaccurate confidence intervals.
The Approximate Distribution Method estimates an approximate distribution that the sample appears to be drawn from to analytically solve for $\tau_f^2$ based on that distribution's parameters. The ci.cor function uses a third-order polynomial family allowing estimation of distribution parameters using marginal skewness and kurtosis that are estimated using the marginal sample skewness and kurtosis statistics (Bishara et al., 2018).

Bishara et al. (2018) conducted two Monte Carlo simulations that showed that the approximate distribution method was effective in dealing with violations of the bivariate normality assumption for a wide range of sample sizes, while the joint moments method was effective mainly when the sample size was extremely large, in the thousands. However, the third-order polynomial family used for the approximate distribution method cannot deal with absolute skewness above 4.4 or kurtosis above 43.4. Note that the approximate distribution method is accurate even when the bivariate normality assumption is satisfied, while the sample joint moments method sometimes fails to achieve the intended coverage even when the bivariate normality was satisfied.

Spearman's Rank-Order Correlation Coefficient

The confidence interval for Spearman's rank-order correlation coefficient is based on the Fisher's $z$ method (se = "fisher"), Fieller et al. (1957) approximate standard error (se = "fieller", default), Bonett and Wright (2000) approximate standard error (se = "bonett") or rank-based inverse normal (RIN) transformation (se = "rin") :

Fisher's $z$ Standard Error $$\sqrt{\frac{1}{(n - 3)}}$$
Fieller et al. (1957) Approximate Standard Error

$$\sqrt{\frac{1.06}{(n - 3)}}$$

Note that this approximation for the standard error is recommended for $n > 10$ and $|rs| < 0.8$.
Bonett and Wright (2000) Approximate Standard Error

$$\sqrt{\frac{1 + \frac{\hat{\theta}^2}{2}}{(n - 3)}}$$

where $\hat{\theta}$ is the point estimate of the Spearman's rank-order correlation coefficient. Note that this approximation for the standard error is recommended for $|\tau| \le 0.9$.
Rin Transformation involves three steps. First, the variable is converted to ranks. Second, the ranks are converted to a 0-to-1 scale using a linear function. Third, this distribution is transformed via the inverse of the normal cumulative distribution function (i.e., via probit transformation). The result is an approximately normal distribution regardless of the original shape of the data, so long as ties are infrequent and $n$ is not too small.

Kendall's Tau-b and Tau-c Correlation Coefficient

The confidence interval for Kendall's Tau-b and Tau-c correlation coefficient is based on the approximate standard error by Fieller et al. (1957):

$$\sigma_z' = \sqrt{\frac{0.437}{(n - 4)}}$$

Note that this approximation for the standard error is recommended for $n > 10$ and $|\tau| < 0.8$.

Bootstrap Confidence Intervals

The ci.cor function supports bootstrap confidence intervals (CI) for the correlation coefficient by changing the default setting boot = "none" to request one of five different types of bootstrap CI (see Efron & Tibshirani, 1993; Davidson & Hinkley, 1997):

"norm": The bias-corrected normal approximation bootstrap CI relies on the normal distribution based on the standard deviation of the bootstrap samples $\hat{\mathit{SE}}^*$. The function corrects for the bootstrap bias, i.e., difference between the bootstrap estimate $\hat{\theta}^*$ and the sample statistic $\hat{\theta}$ centering the interval at $2\hat{\theta} - \hat{\theta}^*$. The BC normal CI of intended coverage of $1 - 2(\alpha/2)$ is given by

$$Normal: (\hat{\theta}_{low}, \hat{\theta}_{upp} = \hat{\theta} - (\hat{\theta}^* - \hat{\theta}) + z^{\alpha/2} \cdot \hat{\mathit{SE}}^*, \hat{\theta} - (\hat{\theta}^* - \hat{\theta}) + z^{1 - \alpha/2} \cdot \hat{\mathit{SE}}^*$$

where $z^{\alpha/2}$ and $z^{1 - \alpha/2}$ denotes the $\alpha$ and the $1 - \alpha$ quantile from the standard normal distribution.
"basic": The basic bootstrap (aka reverse bootstrap percentile) CI is based on the distribution of $\hat{\delta} = \hat{\theta} - \theta$ which is approximated with the bootstrap distribution of $\hat{\delta}^* = \hat{\theta}^* - \hat{\theta}$.

$$Basic: (\hat{\theta}_{low}, \hat{\theta}_{upp} = \hat{\theta} - \hat{\delta}^{*1 - (\alpha/2)}, \hat{\theta} - \hat{\delta}^{*\alpha/2} = 2\hat{\theta} - \hat{\theta}^{*(1 - \alpha/2)} , 2\hat{\theta} - \hat{\theta}^{*(\alpha/2)})$$
"perc": The percentile bootstrap CI is computed by ordering the bootstrap estimates $\hat{\theta}^*_1, \ldots, \hat{\theta}^*_B$ to determine the $(100(\alpha)/2)$th and $(100(1 - \alpha)/2)$th empirical percentile with intended coverage of $1 - 2(\alpha/2)$:

$$Percentile: (\hat{\theta}_{low}, \hat{\theta}_{upp} = \hat{\theta}^{*(1 - \alpha/2)}, \hat{\theta}^{*(\alpha/2)})$$
"bc" (default): The bias-corrected (BC) percentile bootstrap CI corrects the percentile bootstrap CI for median bias of $\hat{\theta^*}$, i.e., the discrepancy between the median of $\hat{\theta}^*$ and $\hat{\theta}$ in normal units. The bias correction $\hat{z}_0$ is obtained from the proportion of bootstrap replications less than the sample estimate $\hat{\theta}$:

$$\hat{z}_0 = \Phi^{-1}\left(\frac{\#{\hat{\theta}^*_b < \hat{\theta}}}{B}\right)$$

where $\Phi^{-1}(.)$ represents the inverse function of the standard normal cumulative distribution function and $B$ is the number of bootstrap replications. The BC percentile CI of intended coverage of $1 - 2(\alpha/2)$ is given by

$$BC: (\hat{\theta}_{low}, \hat{\theta}_{upp} = \hat{\theta}^{*(\alpha_1)}, \hat{\theta}^{*(\alpha_2)})$$

where

$$\alpha_1 = \Phi(2\hat{z}_0 + z^{\alpha/2})$$

$$\alpha_2 = \Phi(2\hat{z}_0 + z^{1 - \alpha/2})$$

where $\Phi(.)$ represents the standard normal cumulative distribution function and $z^{\alpha/2}$ is the $100(\alpha/2)$ percentile of a standard normal distribution.
"bca": The bias-corrected and accelerated (BCa) bootstrap CI corrects the percentile bootstrap CI for median bias $\hat{z}_0$ and for acceleration or skewness $\hat{a}$, i.e., the rate of change of the standard error of $\hat{\theta}$ with respect to the true parameter value $\theta$ on a normalized scale. The standard normal approximation $\hat{\theta} \sim N(\theta, \mathit{SE}^2)$ assumes that the standard error of $\hat{\theta}$ is the same for all $\theta$. The acceleration constant $\hat{a}$ corrects for this unrealistic assumption and can be computed by using jackknife resampling:

$$\hat{a} = \frac{\sum_{i=1}^{n}(\hat{\theta}_{(.)} - \hat{\theta}_{(i)})^3}{6\{\sum_{i=1}^{n}(\theta_{(.)} - \hat{\theta}_{(i)})^2\}^{3/2}}$$

where $\hat{\theta}_{(i)}$ is the sample estimate with the $i$th observation deleted and $\hat{\theta}_{(.)} = \sum_{i=1}^{n}\frac{\hat{\theta}_{(i)}}{n}$. Note that the function uses infinitesimal jackknife instead of regular leave-one-out jackknife that down-weights each observation by an infinitesimal amount of $\frac{0.001}{n}$ instead of removing observations. The BCa percentile CI of intended coverage of $1 - 2(\alpha/2)$ is given by

$$BCa: (\hat{\theta}_{low}, \hat{\theta}_{upp} = \hat{\theta}^{*(\alpha_1)}, \hat{\theta}^{*(\alpha_2)})$$

where

$$\alpha_1 = \Phi\left(\hat{z}_0 + \frac{\hat{z}_0 + z^{\alpha/2}}{1 - \hat{a}(\hat{z}_0 + z^{\alpha/2})}\right)$$

$$\alpha_2 = \Phi\left(\hat{z}_0 + \frac{\hat{z}_0 + z^{1 - \alpha/2}}{1 - \hat{a}(\hat{z}_0 + z^{1 - \alpha/2})}\right)$$

Note that Fisher transformation is applied before computing the confidence intervals to reverse-transform the limits of the interval using the inverse of the Fisher $z$ transformation (fisher = TRUE) when specifying "norm" or "basic" for the argument boot. In addition, interpolation on the normal quantile scale is applied for "basic", "perc", "bc", and "bca" when a non-integer order statistic is required (see equation 5.8 in Davison & Hinkley, 1997).

References

Baguley, T. (2024). CI for Spearman's rho. seriousstats. https://rpubs.com/seriousstats/616206

Bonett, D. G., & Wright, T. A. (2000). Sample size requirements for estimating Pearson, Kendall and Spearman correlations. Psychometrika, 65, 23-28. https://doi.org/10.1007/BF02294183

Bishara, A. J., & Hittner, J. B. (2012). Testing the significance of a correlation with nonnormal data: Comparison of Pearson, Spearman, transformation, and resampling approaches. Psychological Methods, 17(3), 399–417. https://doi.org/10.1037/a0028087

Bishara, A. J., & Hittner, J.B. (2017). Confidence intervals for correlations when data are not normal. Behavior Research Methods, 49, 294-309. https://doi.org/10.3758/s13428-016-0702-8

Bishara, A. J., Li, J., & Nash, T. (2018). Asymptotic confidence intervals for the Pearson correlation via skewness and kurtosis. British Journal of Mathematical and Statistical Psychology, 71(1), 167–185. https://doi.org/10.1111/bmsp.12113

Brown, M. B., & Benedetti, J. K. (1977). Sampling behavior of tests for correlation in two-way contingency tables. Journal of the American Statistical Association, 72 (358), 309-315. https://doi.org/10.1080/01621459.1977.10480995

Canty, A., & Ripley, B. (2024). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-31.

Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge University Press.

Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. Chapman & Hall.

Fieller, E. C., Hartley, H. O., & Pearson, E. S. (1957). Tests for rank correlation coefficients: I. Biometrika, 44, 470-481. https://doi.org/10.2307/2332878

Fisher, R. A. (1921). On the “Probable Error” of a Coefficient of Correlation Deduced from a Small Sample. Metron, 1, 3-32.

Hawkins, D. L. (1989). Using U statistics to derive the asymptotic distribution of Fisher’s Z statistic. American Statistician, 43, 235–237. https://doi.org/10.2307/2685369

Hollander, M., Wolfe, D. A., & Chicken, E. (2015). Nonparametric statistical methods. Wiley.

Examples

Run this code

#----------------------------------------------------------------------------
# Pearson product-moment correlation coefficient

# Example 1a: Approximate distribution method
ci.cor(mtcars[, c("mpg", "drat", "qsec")])

# Alternative specification
ci.cor(mpg, drat, qsec, data = mtcars)

# Example 1b: Joint moments method
ci.cor(mtcars[, c("mpg", "drat", "qsec")], adjust = "joint")

#----------------------------------------------------------------------------
# Spearman's rank-order correlation coefficient

# Example 2a: Fieller et al. (1957) approximate standard error
ci.cor(mtcars[, c("mpg", "drat", "qsec")], method = "spearman")

# Example 2b: Bonett and Wright (2000) approximate standard error
ci.cor(mtcars[, c("mpg", "drat", "qsec")], method = "spearman", se = "bonett")

# Example 2c: Rank-based inverse normal (RIN) transformation
ci.cor(mtcars[, c("mpg", "drat", "qsec")], method = "spearman", se = "rin")

#----------------------------------------------------------------------------
# Kendall's Tau

# Example 3a:  Kendall's Tau-b
ci.cor(mtcars[, c("mpg", "drat", "qsec")], method = "kendall-b")

# Example 3b:  Kendall's Tau-c
ci.cor(mtcars[, c("mpg", "drat", "qsec")], method = "kendall-c")

if (FALSE) {
#----------------------------------------------------------------------------
# Bootstrap Confidence Interval (CI)

# Example 4a: Bias-corrected (BC) percentile bootstrap CI
ci.cor(mtcars[, c("mpg", "drat", "qsec")], boot = "bc")

# Example 4b: Bias-corrected and accelerated (BCa) bootstrap CI,
# 5000 bootstrap replications, set seed of the pseudo-random number generator
ci.cor(mtcars[, c("mpg", "drat", "qsec")], boot = "bca", R = 5000, seed = 123)

#----------------------------------------------------------------------------
# Grouping and Split Variable

# Example 5a: Grouping variable
ci.cor(mpg, drat, qsec, data = mtcars, group = "vs")

# Alternative specification
ci.cor(mtcars[, c("mpg", "drat", "qsec")], group = mtcars$vs)

# Example 5b: Split variable
ci.cor(mpg, drat, qsec, data = mtcars, split = "am")

# Alternative specification
ci.cor(mtcars[, c("mpg", "drat", "qsec")], split = mtcars$am)

# Example 5c: Grouping and split variable
ci.cor(mpg, drat, qsec, data = mtcars, group = "vs", split = "am")

# Alternative specification
ci.cor(mtcars[, c("mpg", "drat", "qsec")], group = mtcars$vs, split = mtcars$am)

#----------------------------------------------------------------------------
# Write Output

# Example 6a: Text file
ci.cor(mtcars[, c("mpg", "drat", "qsec")], write = "CI_Cor_Text.txt")

# Example 6b: Excel file
ci.cor(mtcars[, c("mpg", "drat", "qsec")], write = "CI_Cor_Excel.xlsx")

#----------------------------------------------------------------------------
# Plot Confidence Intervals

# Example 7a: Pearson product-moment correlation coefficient
ci.cor(mpg, drat, qsec, data = mtcars, plot = "ci")

# Example 7b: Grouping variable
ci.cor(mpg, drat, qsec, data = mtcars, group = "vs", plot = "ci")

# Example 7c: Split variable
ci.cor(mpg, drat, qsec, data = mtcars, split = "am", plot = "ci")

# Example 7d: Save plot as PDF file
ci.cor(mpg, drat, qsec, data = mtcars, plot = "ci", saveplot = "CI_Cor.pdf",
       width = 9, height = 6)

# Example 7e: Save plot as PNG file
ci.cor(mpg, drat, qsec, data = mtcars, plot = "ci", saveplot = "CI_Cor.png",
       width = 9, height = 6)

# Example 7f: Plot in line with the default setting

# Load ggplot2 package
library(ggplot2)

# Extract plot data
plotdata <- ci.cor(mtcars[, c("mpg", "drat", "qsec")], plot = "ci",
                   output = FALSE)$plot$plotdata

# Draw plot
ggplot(plotdata, aes(x, y)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
  geom_errorbar(aes(ymin = low, ymax = upp), width = 0.3) +
  geom_point(size = 2.5) +
  scale_x_discrete(name = "") +
  scale_y_continuous(name = "Correlation Coefficient", limits = c(-1, 1),
                     breaks = seq(-1, 1, by = 0.25)) +
  theme_bw() +
  labs(title = NULL, subtitle = NULL) +
  theme(plot.subtitle = element_text(hjust = 0.5),
        plot.title = element_text(hjust = 0.5),
        plot.margin = unit(c(5.5, 5.5, 5.5, 5.5), "pt"),
        axis.text = element_text(size = 10),
        axis.title = element_text(size = 10))

#----------------------------------------------------------------------------
# Plot Bootstrap Samples

# Example 8a: Pearson product-moment correlation coefficient
ci.cor(mpg, drat, qsec, data = mtcars, boot = "bc", plot = "boot")

# Example 8b: Grouping variable
ci.cor(mpg, drat, qsec, data = mtcars, group = "vs", boot = "bc", plot = "boot")

# Example 8c: Split variable
ci.cor(mpg, drat, qsec, data = mtcars, split = "am", boot = "bc", plot = "boot")

# Example 8d: Save plot as PDF file
ci.cor(mpg, drat, qsec, data = mtcars, boot = "bc", plot = "boot",
       saveplot = "CI_Cor_Boot.pdf", width = 14, height = 9)

# Example 8e: Save plot as PNG file
ci.cor(mpg, drat, qsec, data = mtcars, boot = "bc", plot = "boot",
       saveplot = "CI_Cor_Boot.png", width = 14, height = 9)

# Example 8f: Plot in line with the default setting

# Load ggplot2 package
library(ggplot2)

# Extract plot data
plotdata <- ci.cor(mtcars[, c("mpg", "drat", "qsec")], boot = "bc", plot = "boot",
                   output = FALSE)$plot$plotdata

ggplot(plotdata, aes(y)) +
  geom_histogram(aes(y = after_stat(density)), color = "black", alpha = 0.4, fill = "gray85") +
  geom_density(color = "#0072B2", linewidth = 0.5) +
  geom_vline(xintercept = 0, color = "gray70") +
  geom_vline(aes(xintercept = point), linewidth = 0.6) +
  geom_vline(aes(xintercept = low), linewidth = 0.6, linetype = "dashed") +
  geom_vline(aes(xintercept = upp), linewidth = 0.6, linetype = "dashed") +
  facet_wrap(~ x, scales = "free_y") +
  scale_x_continuous(name = "Correlation Coefficient", expand = c(0.02, 0),
                     limits =  c(-1, 1), breaks = seq(-1, 1, by = 0.25)) +
  scale_y_continuous(name = "Probability Density, f(x)",
                     expand = expansion(mult = c(0L, 0.05))) +
  theme_bw() +
  theme(plot.subtitle = element_text(hjust = 0.5),
        strip.text = element_text(size = 11),
        plot.title = element_text(hjust = 0.5),
        plot.margin = unit(c(5.5, 5.5, 5.5, 5.5), "pt"),
        axis.text = element_text(size = 10),
        axis.title = element_text(size = 11))
}

Run the code above in your browser using DataLab