Performs multivariate normality tests, including Marida, Royston, Henze-Zirkler, Dornik-Haansen, E-Statistics, and graphical approaches and implements multivariate outlier detection and univariate normality of marginal distributions through plots and tests, and performs multivariate Box-Cox transformation.
mvn(
data,
subset = NULL,
mvnTest = "hz",
covariance = TRUE,
tol = 1e-25,
alpha = 0.5,
scale = FALSE,
desc = TRUE,
transform = "none",
R = 1000,
univariateTest = "AD",
univariatePlot = "none",
multivariatePlot = "none",
multivariateOutlierMethod = "none",
bc = FALSE,
bcType = "rounded",
showOutliers = FALSE,
showNewData = FALSE
)
a numeric matrix or data frame.
define a variable name if subset analysis is required.
select one of the MVN tests. Type "mardia"
for Mardia's test, "hz"
for Henze-Zirkler's test, "royston"
for Royston's test, "dh"
for Doornik-Hansen's test and energy
for E-statistic. Default is Henze-Zirkler's test "hz"
. See details for further information.
this option works for "mardia"
and "royston"
. If TRUE
covariance matrix is normalized by n
, if FALSE
it is normalized by n-1
.
a numeric tolerance value which isused for inversion of the covariance matrix (default = 1e-25
.
a numeric parameter controlling the size of the subsets over which the determinant is minimized. Allowed values for the alpha are between 0.5 and 1 and the default is 0.5.
if TRUE
scales the colums of data.
a logical argument. If TRUE
calculates descriptive statistics.
select a transformation method to transform univariate marginal via logarithm ("log"
), square root ("sqrt"
) and square ("square"
).
number of bootstrap replicates for Energy test, default is 1000.
select one of the univariate normality tests, Shapiro-Wilk ("SW"
), Cramer-von Mises ("CVM"
), Lilliefors ("Lillie"
), Shapiro-Francia ("SF"
), Anderson-Darling ("AD"
). Default is Anderson-Darling ("AD"
). Do not apply Shapiro-Wilk's test, if dataset includes more than 5000 cases or less than 3 cases.
select one of the univariate normality plots, Q-Q plot ("qq"
), histogram ("histogram"
), box plot ("box"
), scatter ("scatter"
).
"qq"
for chi-square Q-Q plot, "persp"
for perspective plot, "contour"
for contour plot.
select multivariate outlier detection method, "quan"
quantile method based on Mahalanobis distance (default) and "adj"
adjusted quantile method based on Mahalanobis distance.
if TRUE
it applies Box-Cox power transformation.
select "optimal"
or "rounded"
type of Box-Cox power transformation, only applicable if bc = TRUE
, default is "rounded"
.
if TRUE
prints multivariate outliers.
if TRUE
prints new data without outliers.
multivariateNormality
corresponding multivariate normality test statistics and p-value.
univariateNormality
corresponding univariate normality test statistics and p-value.
Descriptives
Descriptive statistics.
multivariateOutliers
multivariate outliers.
newData
new data without multivariate outliers.
multivariate normality plots, Q-Q, perspective or contour.
chi-square Q-Q plot for multivariate outliers.
univariate normality plots, Q-Q plot, histogram, box plot, scatter.
If mvnTest = "mardia"
, it calculates the Mardia's multivariate skewness and kurtosis coefficients as well as their corresponding statistical significance.
It can also calculate corrected version of skewness coefficient for small sample size (n< 20).
For multivariate normality, both p-values of skewness and kurtosis statistics should be greater than 0.05.
If sample size less than 20 then p.value.small should be used as significance value of skewness instead of p.value.skew.
If there are missing values in the data, a listwise deletion will be applied and a complete-case analysis will be performed.
If mvnTest = "hz"
, it calculates the Henze-Zirkler's multivariate normality test. The Henze-Zirkler test is based on a non-negative functional distance that measures the distance between two distribution functions. If the data is multivariate normal, the test statistic HZ is approximately lognormally distributed. It proceeds to calculate the mean, variance and smoothness parameter. Then, mean and variance are lognormalized and the p-value is estimated.
If there are missing values in the data, a listwise deletion will be applied and a complete-case analysis will be performed.
If mvnTest = "royston"
, it calculates the Royston's multivariate normality test. A function to generate the Shapiro-Wilk's W statistic needed to feed the Royston's H test for multivariate normality However, if kurtosis of the data greater than 3 then Shapiro-Francia test is used for leptokurtic samples else Shapiro-Wilk test is used for platykurtic samples.
If there are missing values in the data, a listwise deletion will be applied and a complete-case analysis will be performed. Do not apply Royston's test, if dataset includes more than 5000 cases or less than 3 cases, since it depends on Shapiro-Wilk's test.
If mvnTest = "dh"
, it calculates the Doornik-Hansen's multivariate normality test. The code is adapted from asbio package (Aho, 2017).
If mvnTest = "energy"
, it calculates the Energy multivariate normality test. The code is adapted from energy package (Rizzo and Szekely, 2017).
Korkmaz S, Goksuluk D, Zararsiz G. MVN: An R Package for Assessing Multivariate Normality. The R Journal. 2014 6(2):151-162. URL https://journal.r-project.org/archive/2014-2/korkmaz-goksuluk-zararsiz.pdf
Mardia, K. V. (1970), Measures of multivariate skewnees and kurtosis with applications. Biometrika, 57(3):519-530.
Mardia, K. V. (1974), Applications of some measures of multivariate skewness and kurtosis for testing normality and robustness studies. Sankhy A, 36:115-128.
Henze, N. and Zirkler, B. (1990), A Class of Invariant Consistent Tests for Multivariate Normality. Commun. Statist.-Theor. Meth., 19(10): 35953618.
Henze, N. and Wagner, Th. (1997), A New Approach to the BHEP tests for multivariate normality. Journal of Multivariate Analysis, 62:1-23.
Royston, J.P. (1982). An Extension of Shapiro and Wilks W Test for Normality to Large Samples. Applied Statistics, 31(2):115124.
Royston, J.P. (1983). Some Techniques for Assessing Multivariate Normality Based on the Shapiro-Wilk W. Applied Statistics, 32(2).
Royston, J.P. (1992). Approximating the Shapiro-Wilk W-Test for non-normality. Statistics and Computing, 2:117-119.121133.
Royston, J.P. (1995). Remark AS R94: A remark on Algorithm AS 181: The W test for normality. Applied Statistics, 44:547-551.
Shapiro, S. and Wilk, M. (1965). An analysis of variance test for normality. Biometrika, 52:591611.
Doornik, J.A. and Hansen, H. (2008). An Omnibus test for univariate and multivariate normality. Oxford Bulletin of Economics and Statistics 70, 927-939.
G. J. Szekely and M. L. Rizzo (2013). Energy statistics: A class of statistics based on distances, Journal of Statistical Planning and Inference, http://dx.doi.org/10.1016/j.jspi.2013.03.018
M. L. Rizzo and G. J. Szekely (2016). Energy Distance, WIRES Computational Statistics, Wiley, Volume 8 Issue 1, 27-38. Available online Dec., 2015, http://dx.doi.org/10.1002/wics.1375.
G. J. Szekely and M. L. Rizzo (2017). The Energy of Data. The Annual Review of Statistics and Its Application 4:447-79. 10.1146/annurev-statistics-060116-054026
# NOT RUN {
result = mvn(data = iris[-4], subset = "Species", mvnTest = "hz",
univariateTest = "AD", univariatePlot = "histogram",
multivariatePlot = "qq", multivariateOutlierMethod = "adj",
showOutliers = TRUE, showNewData = TRUE)
#### Multivariate Normality Result
result$multivariateNormality
### Univariate Normality Result
result$univariateNormality
### Descriptives
result$Descriptives
### Multivariate Outliers
result$multivariateOutliers
### New data without multivariate outliers
result$newData
# Note that this function also creates univariate histograms,
# multivariate Q-Q plots for multivariate normality assessment
# and multivariate outlier detection.
# }
Run the code above in your browser using DataLab