boxcoxTransform: Apply a Box-Cox Power Transformation to a Set of Data


Apply a Box-Cox power transformation to a set of data to attempt to induce normality and homogeneity of variance.


boxcoxTransform(x, lambda, eps = .Machine$double.eps)



a numeric vector of positive numbers.


finite numeric scalar indicating what power to use for the Box-Cox transformation.


finite, positive numeric scalar. When the absolute value of lambda is less than eps, lambda is assumed to be 0 for the Box-Cox transformation. The default value is eps=.Machine$double.eps.


numeric vector of transformed observations.


Two common assumptions for several standard parametric hypothesis tests are:

  1. The observations all come from a normal distribution.

  2. The observations all come from distributions with the same variance.

For example, the standard one-sample t-test assumes all the observations come from the same normal distribution, and the standard two-sample t-test assumes that all the observations come from a normal distribution with the same variance, although the mean may differ between the two groups. For standard linear regression models, these assumptions can be stated as: the error terms all come from a normal distribution with mean 0 and and a constant variance.

Often, especially with environmental data, the above assumptions do not hold because the original data are skewed and/or they follow a distribution that is not really shaped like a normal distribution. It is sometimes possible, however, to transform the original data so that the transformed observations in fact come from a normal distribution or close to a normal distribution. The transformation may also induce homogeneity of variance and, for the case of a linear regression model, a linear relationship between the response and predictor variable(s).

Sometimes, theoretical considerations indicate an appropriate transformation. For example, count data often follow a Poisson distribution, and it can be shown that taking the square root of observations from a Poisson distribution tends to make these data look more bell-shaped (Johnson et al., 1992, p.163; Johnson and Wichern, 2007, p.192; Zar, 2010, p.291). A common example in the environmental field is that chemical concentration data often appear to come from a lognormal distribution or some other positively-skewed distribution (e.g., gamma). In this case, taking the logarithm of the observations often appears to yield normally distributed data.

Ideally, a data transformation is chosen based on knowledge of the process generating the data, as well as graphical tools such as quantile-quantile plots and histograms.

Box and Cox (1964) presented a formalized method for deciding on a data transformation. Given a random variable \(X\) from some distribution with only positive values, the Box-Cox family of power transformations is defined as:

\(Y\) = \(\frac{X^\lambda - 1}{\lambda}\) \(\lambda \ne 0\)

where \(Y\) is assumed to come from a normal distribution. This transformation is continuous in \(\lambda\). Note that this transformation also preserves ordering; that is, if \(X_1 < X_2\) then \(Y_1 < Y_2\).

Box and Cox (1964) proposed choosing the appropriate value of \(\lambda\) based on maximizing a likelihood function. See the help file for boxcox for details.

Note that for non-zero values of \(\lambda\), instead of using the formula of Box and Cox in Equation (1), you may simply use the power transformation: $$Y = X^\lambda \;\;\;\;\;\; (2)$$ since these two equations differ only by a scale difference and origin shift, and the essential character of the transformed distribution remains unchanged.

The value \(\lambda=1\) corresponds to no transformation. Values of \(\lambda\) less than 1 shrink large values of \(X\), and are therefore useful for transforming positively-skewed (right-skewed) data. Values of \(\lambda\) larger than 1 inflate large values of \(X\), and are therefore useful for transforming negatively-skewed (left-skewed) data (Helsel and Hirsch, 1992, pp.13-14; Johnson and Wichern, 2007, p.193). Commonly used values of \(\lambda\) include 0 (log transformation), 0.5 (square-root transformation), -1 (reciprocal), and -0.5 (reciprocal root).

It is often recommend that when dealing with several similar data sets, it is best to find a common transformation that works reasonably well for all the data sets, rather than using slightly different transformations for each data set (Helsel and Hirsch, 1992, p.14; Shumway et al., 1989).


See Also

boxcox, Data Transformations, Goodness-of-Fit Tests.


Run this code
  # Generate 30 observations from a lognormal distribution with 
  # mean=10 and cv=2, then look at some normal quantile-quantile 
  # plots for various transformations.
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  x <- rlnormAlt(30, mean = 10, cv = 2)

  qqPlot(x, add.line = TRUE)

  qqPlot(boxcoxTransform(x, lambda = 0.5), add.line = TRUE) 

  qqPlot(boxcoxTransform(x, lambda = 0), add.line = TRUE) 

  # Clean up
# }

