Learn R Programming

fdANOVA (version 0.1.2)

fanova.tests: Tests for FANOVA Problem

Description

Performs the testing procedures for the one-way analysis of variance for (univariate) functional data (FANOVA). See Section 2.1 of the vignette file (vignette("fdANOVA", package = "fdANOVA")), for details of the tests.

We consider the \(l\) groups of independent random functions \(X_{ij}(t)\), \(i=1,\dots,l,\) \(j=1,\dots,n_i\) defined over a closed and bounded interval \(I=[a,b]\). Let \(n=n_1+\dots+n_l\). These groups may differ in mean functions, i.e., we assume that \(X_{ij}(t)\), \(j=1,\dots,n_i\) are stochastic processes with mean function \(\mu_i(t)\), \(t\in I\) and covariance function \(\gamma(s, t)\), \(s,t\in I\), for \(i=1,\dots,l\). Of interest is to test the following null hypothesis $$ H_0:\mu_1(t)=\dots=\mu_l(t),\ t\in I. $$ The alternative is the negation of the null hypothesis. We assume that each functional observation is observed on a common grid of \(\mathcal{T}\) design time points equally spaced in \(I\) (see Section 3.1 of the vignette file, vignette("fdANOVA", package = "fdANOVA")).

Usage

fanova.tests(x = NULL, group.label, test = "ALL",
             params = NULL,
             parallel = FALSE, nslaves = NULL)

# more detailed usage of params: # params = list(paramFP = list(int, B.FP = 1000, # basis = c("Fourier", "b-spline", "own"), # own.basis, own.cross.prod.mat, # criterion = c("BIC", "eBIC", "AIC", "AICc", "NO"), # commonK = c("mode", "min", "max", "mean"), # minK = NULL, maxK = NULL, norder = 4, gamma.eBIC = 0.5) # paramCH = 10000, # paramCS = 10000, # paramL2b = 10000, # paramFb = 10000, # paramFmaxb = 10000, # paramTRP = list(k = 30, projection = c("GAUSS", "BM"), # permutation = FALSE, B.TRP = 10000, # independent.projection.tests = TRUE))

Arguments

x

a \(\mathcal{T}\times n\) matrix of data, whose each column is a discretized version of a function and rows correspond to design time points. Its default values is NULL, since if the FP test is only used, we can give a basis representation of the data instead of raw observations (see the list paramFP below). For any of the other testing procedures, the raw data are needed.

group.label

a vector containing group labels.

test

a kind of indicator which establishes a choice of FANOVA tests to be performed. Its default value means that all testing procedures of Section 2.1 of the vignette file will be used. When we want to use only some tests, the parameter test is an appropriate subvector of the following vector of tests' labels c("FP", "CH", "CS", "L2N", "L2B", "L2b", "FN", "FB", "Fb", "GPF", "Fmaxb", "TRP"), where "FP" - permutation test based on basis function representation (Gorecki and Smaga, 2015); "CH" and "CS" - L2-norm-based parametric bootstrap tests for homoscedastic and heteroscedastic samples, respectively (Cuevas et al., 2004); "L2N" and "L2B" - L2-norm-based test with naive and bias-reduced method of estimation, respectively (Faraway, 1997; Zhang and Chen, 2007; Zhang, 2013); "L2b" - L2-norm-based bootstrap test (Zhang, 2013); "FN" and "FB" - F-type test with naive and bias-reduced method of estimation, respectively (Shen and Faraway, 2004; Zhang, 2011); "Fb" - F-type bootstrap test (Zhang, 2013); "GPF" - globalizing the pointwise F-test (Zhang and Liang, 2014); "Fmaxb" - Fmax bootstrap test (Zhang et al., 2018); "TRP" - tests based on random projections (Cuesta-Albertos and Febrero-Bande, 2010).

params

a list of additional parameters for the FP, CH, CS, L\(^2\)b, Fb, Fmaxb tests and the tests based on random projections. It can contain all or a part of the elements paramFP, paramCH, paramCS, paramL2b, paramFb, paramFmaxb and paramTRP for passing the parameters for the FP, CH, CS, L\(^2\)b, Fb, Fmaxb tests and tests based on random projections, respectively, to the function fanova.tests. They are described below. The default value of params means that these tests are performed with their default values.

paramFP

a list containing the parameters for the FP test.

int

a vector of two elements representing the interval \(I=[a,b]\). When it is not specified, it is determined by a number of design time points.

B.FP

a number of permutation replicates for the FP tests.

basis

a choice of basis of functions used in the basis function representation of the data.

own.basis

if basis = "own", a \(K\times n\) matrix with columns containing the coefficients of the basis function representation of the observations.

own.cross.prod.mat

if basis = "own", a \(K\times K\) cross product matrix corresponding to a basis used to obtain the matrix own.basis.

criterion

a choice of information criterion for selecting the optimum value of \(K\).

criterion = "NO" means that \(K\) is equal to the parameter maxK defined below. We have $$\code{BIC}(X_{ij})=\mathcal{T}\log(\mathbf{e}_{ij}^{\top}\mathbf{e}_{ij}/\mathcal{T})+K\log\mathcal{T},$$ $$\code{eBIC}(X_{ij})=\mathcal{T}\log(\mathbf{e}_{ij}^{\top}\mathbf{e}_{ij}/\mathcal{T})+K[\log\mathcal{T}+2\gamma\log(K_{\max})],$$ $$\code{AIC}(X_{ij})=\mathcal{T}\log(\mathbf{e}_{ij}^{\top}\mathbf{e}_{ij}/\mathcal{T})+2K$$ and $$\code{AICc}(X_{ij})=\code{AIC}(X_{ij})+2K(K + 1)/(n-K-1),$$ where $$\mathbf{e}_{ij}=(e_{ij1},\dots,e_{ij\mathcal{T}})^{\top},$$ $$e_{ijr}=X_{ij}(t_r)-\sum_{m=1}^K\hat{c}_{ijm}\varphi_m(t_r),$$ \(t_1,\dots,t_{\mathcal{T}}\) are the design time points, \(\gamma\in[0,1]\), \(K_{\max}\) is a maximum \(K\) considered and \(\log\) denotes the natural logarithm.

commonK

a choice of method for selecting the common value for all observations from the values of \(K\) corresponding to all processes.

minK

a minimum value of \(K\). When basis = "Fourier", it has to be an odd number. If minK = NULL, we take minK = 3. For basis = "b-spline", minK has to be greater than or equal to norder defined below. If minK = NULL or minK < norder, then we take minK = norder.

maxK

a maximum value of \(K\). When basis = "Fourier", it has to be an odd number. If maxK = NULL, we take maxK equal to the largest odd number smaller than the number of design time points. If maxK is greater than or equal to the number of design time points, maxK is taken as above. For basis = "b-spline", maxK has to be smaller than or equal to the number of design time points. If maxK = NULL or maxK is greater than the number of design time points, then we take maxK equal to the number of design time points.

norder

if basis = "b-spline", an integer specifying the order of b-splines.

gamma.eBIC

a \(\gamma\in[0,1]\) parameter in the eBIC.

paramCH

a number of discretized artificial trajectories for generating Gaussian processes for the CH test.

paramCS

a number of discretized artificial trajectories for generating Gaussian processes for the CS test.

paramL2b

a number of bootstrap samples for the L\(^2\)b test.

paramFb

a number of bootstrap samples for the Fb test.

paramFmaxb

a number of bootstrap samples for the Fmaxb test.

paramTRP

a list containing the parameters of the tests based on random projections.

k

a vector of numbers of projections.

projection

a method of generating Gaussian processes in step 1 of the tests based on random projections presented in Section 2 of the vignette file. If projection = "GAUSS", the Gaussian white noise is generated as in the function anova.RPm from the R package fda.usc. In the second case, the Brownian motion is generated.

permutation

a logical indicating whether to compute p-values of the tests based on random projections by permutation method.

B.TRP

a number of permutation replicates for the tests based on random projections.

independent.projection.tests

a logical indicating whether to generate the random projections independently or dependently for different elements of vector k. In the first case, the random projections for each element of vector k are generated separately, while in the second one, they are generated as chained subsets, e.g., for k = c(5, 10), the first 5 projections are a subset of the last 10. The second way of generating random projections is faster than the first one.

parallel

a logical indicating whether to use parallelization.

nslaves

if parallel = TRUE, a number of slaves. Its default value means that it will be equal to a number of logical processes of a computer used.

Value

A list with class "fanovatests" containing the following components (|k| denotes the length of vector k):

FP

a list containing value of test statistic statFP, p-value pvalueFP and used parameters for the FP test. The chosen optimal length of basis expansion K is also given there.

CH

a list containing value of test statistic statCH, p-value pvalueCH and used parameter paramCH for the CH test.

CS

a list containing value of test statistic statCS, p-value pvalueCS and used parameter paramCS for the CS test.

L2N

a list containing value of test statistic statL2, p-value pvalueL2N and values of estimators betaL2N and dL2N used in approximation of null distribution of test statistic for the L\(^2\)N test.

L2B

a list containing value of test statistic statL2, p-value pvalueL2B and values of estimators betaL2B and dL2B used in approximation of null distribution of test statistic for the L\(^2\)B test.

L2b

a list containing value of test statistic statL2, p-value pvalueL2b and used parameter paramL2b for the L\(^2\)b test.

FN

a list containing value of test statistic statF, p-value pvalueFN and values of estimators d1FN and d2FN used in approximation of null distribution of test statistic for the FN test.

FB

a list containing value of test statistic statF, p-value pvalueFB and values of estimators d1FB and d2FB used in approximation of null distribution of test statistic for the FB test.

Fb

a list containing value of test statistic statF, p-value pvalueFb and used parameter paramFb for the Fb test.

GPF

a list containing value of test statistic statGPF, p-value pvalueGPF and values of estimators betaGPF and dGPF used in approximation of null distribution of test statistic for the GPF test.

Fmaxb

a list containing value of test statistic statFmax, p-value pvalueFmaxb and used parameter paramFmaxb for the Fmaxb test.

TRP

a list containing the following elements: vectors pvalues.anova, pvalues.ATS, pvalues.WTPS of length |k| containing p-values for tests based on random projections and for numbers of projections given in k; if independent.projection.tests = TRUE, a list data.projections of length |k|, whose \(i\)th element is an \(n\times\) k[i] matrix with columns being projections of the data; when independent.projection.tests = FALSE, an \(n\times \max\)(k) matrix data.projections with columns being projections of the data; used parameters for the tests based on random projections.

and the values of other used parameters: data = x, group.label, etc.

Details

To perform step 3 of the projection procedure given in Section 2.1 of the vignette file, we use five tests: the standard (paramTRP$permutation = FALSE) and permutation (paramTRP$permutation = TRUE) tests based on ANOVA F-test statistic and ANOVA-type statistic (ATS) proposed by Brunner et al. (1997), as well as the testing procedure based on Wald-type permutation statistic (WTPS) of Pauly et al. (2015).

References

Brunner E, Dette H, Munk A (1997). Box-Type Approximations in Nonparametric Factorial Designs. Journal of the American Statistical Association 92, 1494-1502.

Cuesta-Albertos JA, Febrero-Bande M (2010). A Simple Multiway ANOVA for Functional Data. Test 19, 537-557.

Cuevas A, Febrero M, Fraiman R (2004). An Anova Test for Functional Data. Computational Statistics & Data Analysis 47, 111-122.

Faraway J (1997). Regression Analysis for a Functional Response. Technometrics 39, 254-261.

Gorecki T, Smaga L (2015). A Comparison of Tests for the One-Way ANOVA Problem for Functional Data. Computational Statistics 30, 987-1010.

Gorecki T, Smaga L (2017). Multivariate Analysis of Variance for Functional Data. Journal of Applied Statistics 44, 2172-2189.

Pauly M, Brunner E, Konietschke F (2015). Asymptotic Permutation Tests in General Factorial Designs. Journal of the Royal Statistical Society Series B 77, 461-473.

Shen Q, Faraway J (2004). An F Test for Linear Models with Functional Responses. Statistica Sinica 14, 1239-1257.

Zhang JT (2011). Statistical Inferences for Linear Models with Functional Responses. Statistica Sinica 21, 1431-1451.

Zhang JT (2013). Analysis of Variance for Functional Data. Chapman & Hall, London.

Zhang JT, Chen JW (2007). Statistical Inferences for Functional Data. The Annals of Statistics 35, 1052-1079.

Zhang JT, Cheng MY, Wu HT, Zhou B (2018). A New Test for Functional One-way ANOVA with Applications to Ischemic Heart Screening. Computational Statistics and Data Analysis https://doi.org/10.1016/j.csda.2018.05.004

Zhang JT, Liang X (2014). One-Way ANOVA for Functional Data via Globalizing the Pointwise F-Test. Scandinavian Journal of Statistics 41, 51-71.

See Also

fmanova.ptbfr, fmanova.trp, plotFANOVA, plot.fanovatests

Examples

Run this code
# NOT RUN {
# Some of the examples may run some time.

# gait data (the first feature)
library(fda)
gait.data.frame <- as.data.frame(gait)
x.gait <- as.matrix(gait.data.frame[, 1:39])

# vector of group labels
group.label.gait <- rep(1:3, each = 13)
# }
# NOT RUN {
# all FANOVA tests with default parameters
set.seed(123)
(fanova1 <- fanova.tests(x = x.gait, group.label = group.label.gait))
summary(fanova1)
# data projections generated in the test based on random projections
fanova1$TRP$data.projections

# only three tests with non-default parameters
set.seed(123)
fanova2 <- fanova.tests(x.gait, group.label.gait,
                        test = c("FP", "GPF", "Fmaxb"),
                        params = list(paramFP = list(int = c(0.025, 0.975),
                                                     B.FP = 1000, basis = "b-spline",
                                                     criterion = "eBIC",
                                                     commonK = "mean",
                                                     minK = 5, maxK = 20,
                                                     norder = 4, gamma.eBIC = 0.7),
                                      paramFmaxb = 1000))
summary(fanova2)

# the FP test with predefined basis function representation
library(fda)
fbasis <- create.bspline.basis(rangeval = c(0.025, 0.975), 19, norder = 4)
own.basis <- Data2fd(seq(0.025, 0.975, length = 20), x.gait, fbasis)$coefs
own.cross.prod.mat <- inprod(fbasis, fbasis)
set.seed(123)
fanova3 <- fanova.tests(group.label = group.label.gait, test = "FP",
                        params = list(paramFP = list(B.FP = 1000, basis = "own",
                                                     own.basis = own.basis,
                                                     own.cross.prod.mat = own.cross.prod.mat)))
summary(fanova3)

# the tests based on random projections with the Gaussian white noise generated for projections
set.seed(123)
fanova4 <- fanova.tests(x.gait, group.label.gait, test = "TRP",
                        parallel = TRUE, nslaves = 2,
                        params = list(paramTRP = list(k = c(10, 20, 30), B.TRP = 1000)))
summary(fanova4)
set.seed(123)
fanova5 <- fanova.tests(x.gait, group.label.gait, test = "TRP",
                        parallel = TRUE, nslaves = 2,
                        params = list(paramTRP = list(k = c(10, 20, 30),
                                                      permutation = TRUE, B.TRP = 1000)))
summary(fanova5)

# the tests based on random projections with the Brownian motion generated for projections
set.seed(123)
fanova6 <- fanova.tests(x.gait, group.label.gait, test = "TRP",
                        parallel = TRUE, nslaves = 2,
                        params = list(paramTRP = list(k = c(10, 20, 30), projection = "BM",
                                                      B.TRP = 1000)))
summary(fanova6)
set.seed(123)
fanova7 <- fanova.tests(x.gait, group.label.gait, test = "TRP",
                        parallel = TRUE, nslaves = 2,
                        params = list(paramTRP = list(k = c(10, 20, 30), projection = "BM",
                                                      permutation = TRUE, B.TRP = 1000)))
summary(fanova7)
# }

Run the code above in your browser using DataLab