randtest: Randomization test for PLS regression

Description

randtest is used to carry out randomization/permutation test for a PLS regression model

Usage

randtest(
  x,
  y,
  ncomp = 15,
  center = TRUE,
  scale = FALSE,
  nperm = 1000,
  sig.level = 0.05,
  silent = TRUE,
  exclcols = NULL,
  exclrows = NULL
)

Value

Returns an object of randtest class with following fields:

nperm: number of permutations used for the test.
stat: statistic values calculated for each component.
alpha: alpha values calculated for each component.
statperm: matrix with statistic values for each permutation.
corrperm: matrix with correlation between predicted and reference y-vales for each permutation.
ncomp.selected: suggested number of components.

Arguments

x: matrix with predictors.
y: vector or one-column matrix with response.
ncomp: maximum number of components to test.
center: logical, center or not predictors and response values.
scale: logical, scale (standardize) or not predictors and response values.
nperm: number of permutations.
sig.level: significance level.
silent: logical, show or not test progress.
exclcols: columns of x to be excluded from calculations (numbers, names or vector with logical values)
exclrows: rows to be excluded from calculations (numbers, names or vector with logical values)

Details

The class implements a method for selection of optimal number of components in PLS1 regression based on the randomization test [1]. The basic idea is that for each component from 1 to ncomp a statistic T, which is a covariance between t-score (X score, derived from a PLS model) and the reference Y values, is calculated. By repeating this for randomly permuted Y-values a distribution of the statistic is obtained. A parameter alpha is computed to show how often the statistic T, calculated for permuted Y-values, is the same or higher than the same statistic, calculated for original data without permutations.

If a component is important, then the covariance for unpermuted data should be larger than the covariance for permuted data and therefore the value for alpha will be quie small (there is still a small chance to get similar covariance). This makes alpha very similar to p-value in a statistical test.

The randtest procedure calculates alpha for each component, the values can be observed using summary or plot functions. There are also several function, allowing e.g. to show distribution of statistics and the critical value for each component.

References

S. Wiklund et al. Journal of Chemometrics 21 (2007) 427-439.

Examples

Run this code

### Examples of using the test

## Get the spectral data from Simdata set and apply SNV transformation

data(simdata)

y = simdata$conc.c[, 3]
x = simdata$spectra.c
x = prep.snv(x)

## Run the test and show summary
## (normally use higher nperm values > 1000)
r = randtest(x, y, ncomp = 4, nperm = 200, silent = FALSE)
summary(r)

## Show plots

par( mfrow = c(3, 2))
plot(r)
plotHist(r, ncomp = 3)
plotHist(r, ncomp = 4)
plotCorr(r, 3)
plotCorr(r, 4)
par( mfrow = c(1, 1))

Run the code above in your browser using DataLab

`print.randtest`	prints information about a `randtest` object.
`summary.randtest`	shows summary statistics for the test.
`plot.randtest`	shows bar plot for alpha values.
`plotHist.randtest`	shows distribution of statistic plot.
`plotCorr.randtest`	shows determination coefficient plot.