hotelling.stat: Calculate Hotelling's two sample T-squared test statistic

Description

Calculate Hotelling's T-squared test statistic for the difference in two multivariate means.

Usage

hotelling.stat(x, y, shrinkage = FALSE, var.equal = TRUE)

Arguments

a nx by p matrix containing the data points from sample 1 or a list containing elements mean, cov, and n where mean is a mean vector of length p, cov is a variance-covariance matrix of dimension p by p, and n is the sample size

a ny by p matrix containg the data points from sample 2 or a list containing elements mean, cov, and n where mean is a mean vector of length p, cov is a variance-covariance matrix of dimension p by p, and n is the sample size

shrinkage

set to TRUE if the covariance matrices are to be estimated using Schaefer and Strimmer's James-Stein shrinkage estimator. Note this only works when raw data is supplied, and will not work if summary statistics are supplied.

var.equal

set to TRUE if the covariance matrices are (assumed to be) equal

Value

A list containing the following components:

statistic

Hotelling's (unscaled) T-squared statistic

The scaling factor - this can be used by by multiplying it with the test statistic, or dividing the critical F value

a vector of length containing the numerator and denominator degrees of freedom

The sample size of sample 1

The sample size of sample 2

The number of variables to be used in the comparison

Details

Note, the sample size requirements are that nx + ny - 1 > p. The procedure will stop if this is not met and the shrinkage estimator is not being used. The shrinkage estimator has not been rigorously tested for this application (small p, smaller n).

References

Hotelling, H. (1931). ``The generalization of Student's ratio.'' Annals of Mathematical Statistics 2 (3): 360--378.

Schaefer, J., and K. Strimmer (2005). ``A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.'' Statist. Appl. Genet. Mol. Biol. 4: 32.

Opgen-Rhein, R., and K. Strimmer (2007). ``Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach.'' Statist. Appl. Genet. Mol. Biol. 6: 9.

NEL, D.G. and VAN DER MERWE, C.A. (1986). ``A solution to the - multivariate Behrens-Fisher problem.'' Comm. Statist. Theor.- Meth., A15, 12, 3719-3736.

Examples

Run this code

# NOT RUN {
data(container.df)
split.data = split(container.df[,-1],container.df$gp)
x = split.data[[1]]
y = split.data[[2]]
hotelling.stat(x, y)
hotelling.stat(x, y, TRUE)

# }

Run the code above in your browser using DataLab