Learn R Programming

ICSOutlier (version 0.4-0)

comp.simu.test: Selection of Nonnormal Invariant Components Using Simulations

Description

Identifies invariant coordinates that are nonnormal using simulations under a standard multivariate normal model for a specific data setup and scatter combination.

Usage

comp.simu.test(object, m = 10000, type = "smallprop", level = 0.05, 
  adjust = TRUE, ncores = NULL, iseed = NULL, pkg = "ICSOutlier", 
  qtype = 7, ...)

Value

A list containing:

index

integer vector indicating the indices of the selected components.

test

string "simulation".

criterion

vector of the cut-off values for all the eigenvalues.

levels

vector of the levels used to derive the cut-offs for each component.

adjust

logical. TRUE if adjusted.

type

type used.

m

number of iterations m used in the simulations.

Arguments

object

object of class ics2 where both S1 and S2 are specified as functions. The sample size and the dimension of interest are also obtained from the object.

m

number of simulations. Note that since extreme quantiles are of interest m should be large.

type

currently the only type option is "smallprop". See details.

level

the initial level used to make a decision. The cut-off values are the (1-level)th quantile of the eigenvalues obtained from simulations. See details.

adjust

logical. If TRUE, the quantiles levels are adjusted. Default is TRUE. See details.

ncores

number of cores to be used. If NULL or 1, no parallel computing is used. Otherwise makeCluster with type = "PSOCK" is used.

iseed

If parallel computation is used the seed passed on to clusterSetRNGStream. Default is NULL which means no fixed seed is used.

pkg

When using parallel computing, a character vector listing all the packages which need to be loaded on the different cores via require. Must be at least "ICSOutlier" and must contain the packages needed to compute the scatter matrices.

qtype

specifies the quantile algorithm used in quantile.

...

further arguments passed on to the function quantile.

Author

Aurore Archimbaud and Klaus Nordhausen

Details

Based on simulations it detects which of the components follow a univariately normal distribution. More precisely it identifies the observed eigenvalues larger than the ones coming from normal distributed data. m standard normal data sets are simulated using the same data size and scatters as specified in the ics2 object. The cut-off values are determined based on a quantile of these simulated eigenvalues.

As the eigenvalues, aka generalized kurtosis values, of ICS are ordered it is natural to perform the comparison in a specific order depending on the purpose. Currently the only available type is "smallprop" so starting with the first component, the observed eigenvalues are successively compared to these cut-off values. The precedure stops when an eigenvalue is below the corresponding cut-off, so when a normal component is detected.

If adjust = FALSE all eigenvalues are compared to the same (1-level)th level of the quantile. This leads however often to too many selected components. Therefore some multiple testing adjustment might be useful. The current default adjusts the quantile for the jth component as 1-level/j.

Note that depending on the data size and scatters used this can take a while and so it is more efficient to parallelize computations. Note also that the function is seldomly called directly by the user but internally by ics.outlier.

References

Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. <https://doi.org/10.1016/j.csda.2018.06.011>.

See Also

ics2, comp.norm.test

Examples

Run this code
# For a real analysis use larger values for m and more cores if available

set.seed(123)
Z <- rmvnorm(1000, rep(0, 6))
# Add 20 outliers on the first component
Z[1:20, 1] <- Z[1:20, 1] + 10
pairs(Z)
icsZ <- ics2(Z)
# For demo purpose only small m value, should select the first component
comp.simu.test(icsZ, m = 400, ncores = 1)

if (FALSE) {
# For using two cores
# For demo purpose only small m value, should select the first component
comp.simu.test(icsZ, m = 500, ncores = 2, iseed = 123)
  
# For using several cores and for using a scatter function from a different package
# Using the parallel package to detect automatically the number of cores
library(parallel)
# ICS with MCD estimates and the usual estimates
# Need to create a wrapper for the CovMcd function to return first the location estimate
# and the scatter estimate secondly.
library(rrcov)
myMCD <- function(x,...){
  mcd <- CovMcd(x,...)
  return(list(location = mcd@center, scatter = mcd@cov))
}
icsZmcd <- ics2(Z, S1 = myMCD, S2 = MeanCov, S1args = list(alpha = 0.75))
# For demo purpose only small m value, should select the first component
comp.simu.test(icsZmcd, m = 500, ncores = detectCores()-1, 
               pkg = c("ICSOutlier", "rrcov"), iseed = 123)
}

# Example with no outlier
Z0 <- rmvnorm(1000, rep(0, 6))
pairs(Z0)
icsZ0 <- ics2(Z0)
#Should select no component
comp.simu.test(icsZ0, m = 400, level = 0.01, ncores = 1)

Run the code above in your browser using DataLab