Learn R Programming

cellWise (version 2.5.3)

transfo: Robustly fit the Box-Cox or Yeo-Johnson transformation

Description

This function uses reweighted maximum likelihood to robustly fit the Box-Cox or Yeo-Johnson transformation to each variable in a dataset. Note that this function first calls checkDataSet to ensure that the variables to be transformed are not too discrete.

Usage

transfo(X, type = "YJ", robust = TRUE,
        standardize = TRUE,
        quant = 0.99, nbsteps = 2, checkPars = list())

Value

A list with components:

  • lambdahats
    the estimated transformation parameter for each column of X.

  • Y
    A matrix in which each column is the transformed version of the corresponding column of X. The transformed version includes pre- and post-standardization if standardize=TRUE.

  • muhat
    The estimated location of each column of Y.

  • sigmahat
    The estimated scale of each column of Y.

  • weights
    The final weights from the reweighting.

  • ttypes
    The type of transform used in each column.

  • objective
    Value of the (reweighted) maximum likelihood objective function.

  • values of checkDataSet, unless coreOnly is TRUE.

Arguments

X

A data matrix of dimensions n x d. Its columns are the variables to be transformed.

type

The type of transformation to be fit. Should be one of:

  • "BC": Box-Cox power transformation. Only works for strictly positive variables. If this type is given but a variable is not strictly positive, the function stops with a message about that variable.

  • "YJ" Yeo-Johnson power transformation. The data may have positive as well as negative values.

  • "bestObj" for strictly positive variables both BC and YJ are run, and the solution with lowest objective is kept. On the other variables YJ is run.

robust

if TRUE the Reweighted Maximum Likelihood method is used, which first computes a robust initial estimate of the transformation parameter lambda. If FALSE the classical ML method is used.

standardize

whether to standardize the variables before and after the power transformation. See Details below.

quant

quantile for determining the weights in the reweighting step (ignored when robust=FALSE).

nbsteps

number of reweighting steps (ignored when robust=FALSE).

checkPars

Optional list of parameters used in the call to checkDataSet. The options are:

  • coreOnly
    If TRUE, skip the execution of checkDataset. Defaults to FALSE

  • numDiscrete
    A column that takes on numDiscrete or fewer values will be considered discrete and not retained in the cleaned data. Defaults to \(5\).

  • precScale
    Only consider columns whose scale is larger than precScale. Here scale is measured by the median absolute deviation. Defaults to \(1e-12\).

  • silent
    Whether or not the function progress messages should be printed. Defaults to FALSE.

Author

J. Raymaekers and P.J. Rousseeuw

Details

In case standardize = TRUE, the variables is standardized before and after transformation. For BC the variable is divided by its median before transformation. For YJ and robust = TRUE this subtracts its median and divides by its mad (median absolute deviation) before transformation. For YJ and robust = FALSE this subtracts the mean and divides by the standard deviation before transformation. For the standardization after the transformation, the classical mean and standard deviation are used in case robust = FALSE. If robust = TRUE, the mean and standard deviation are calculated robustly on a subset of inliers.

References

J. Raymaekers and P.J. Rousseeuw (2021). Transforming variables to central normality. Machine Learning. tools:::Rd_expr_doi("10.1007/s10994-021-05960-5")(link to open access pdf)

See Also

transfo_newdata, transfo_transformback

Examples

Run this code

# find Box-Cox transformation parameter for lognormal data:
set.seed(123)
x <- exp(rnorm(1000))
transfo.out <- transfo(x, type = "BC")
# estimated parameter:
transfo.out$lambdahat
# value of the objective function:
transfo.out$objective
# the transformed variable:
transfo.out$Y
# the type of transformation used:
transfo.out$ttypes
# qqplot of the transformed variable:
qqnorm(transfo.out$Y); abline(0,1)

# For more examples, we refer to the vignette:
if (FALSE) {
vignette("transfo_examples")
}

Run the code above in your browser using DataLab