Dagnelie's goodness-of-fit test of multivariate normality is applicable to
multivariate data. Mahalanobis generalized distances are computed between each object and the multivariate centroid of all objects. Dagnelie<U+2019>s approach is that, for multinormal data, the generalized distances should be normally distributed. The function computes a Shapiro-Wilk test of normality of the Mahalanobis distances; this is our improvement of Dagnelie<U+2019>s method.
The null hypothesis (H0) is that the data are multinormal, a situation where the Mahalanobis distances should be normally distributed. In that case, the test should not reject H0, subject to type I error at the selected significance level.
Numerical simulations by D. Borcard have shown that the test had correct levels of type I error for values of n between 3p and 8p, where n is the number of objects and p is the number of variables in the data
matrix (simulations with 1 <= p <= 100). Outside that range of n values, the results were too liberal, meaning that the test rejected too often the null hypothesis of normality. For p = 2, the simulations showed the test to be valid for 6 <= n <= 13 and too liberal outside that range. If H0 is not rejected in a situation where the test is too liberal, the result is trustworthy.
Calculation of the Mahalanobis distances requires that n > p+1 (actually, n > rank+1). With fewer objects (n), all points are at equal Mahalanobis distances from the centroid in the resulting space, which has min(rank,(n-1))
dimensions. For data matrices that happen to be collinear, the function uses ginv
for inversion.
This test is not meant to be used with univariate data; in simulations, the type I error rate was higher than the 5% significance level for all values of n. Function shapiro.test
should be used in that situation.