Learn R Programming

FarmTest (version 1.0.3)

farm.test: Main function performing factor-adjusted robust test for means

Description

This function is used to conduct robust statistical test for means of multivariate data, after adjusting for known or unknown latent factors using the methods in Fan et al.(2017) and Zhou et al.(2017). It uses the Huber's loss function (Huber (1964)) to robustly estimate data parameters.

Usage

farm.test(X, H0 = NULL, fx = NULL, Kx = NULL, Y = NULL, fy = NULL,
  Ky = NULL, alternative = c("two.sided", "lesser", "greater"),
  alpha = NULL, robust = TRUE, cv = TRUE, tau = 2, verbose = FALSE,
  ...)

Arguments

X

a n x p data matrix with each row being a sample. You wish to test a hypothesis for the mean of each column of X.

H0

an optional p x 1 vector of the true value of the means (or difference in means if you are performing a two sample test). The default is the zero.

fx

an optional factor matrix with each column being a factor for X. Same number of rows as X.

Kx

a optional number of factors to be estimated for X. Otherwise estimated internally. Kx>=0

Y

an optional data matrix that must have the same number of columns as X. You wish test the equality of means of each columns of X and Y.

fy

an optional factor matrix with each column being a factor for Y. Same number of rows as Y. Only used for a two sample test.

Ky

a optional number of factors to be estimated for Y. Otherwise estimated internally.

alternative

an optional character string specifying the alternate hypothesis, must be one of "two.sided" (default), "greater" or "lesser". You can specify just the initial letter.

alpha

an optional level for controlling the false discovery rate (in decimals). Default is 0.05. Must be in \((0,1)\).

robust

a boolean, specifying whether or not to use robust estimators for mean and variance. Default is TRUE.

cv

a boolean, specifying whether or not to run cross-validation for the tuning parameter. Default is TRUE. Only used if robust is TRUE.

tau

>0, multiplier for the tuning parameter for Huber loss function. Default is 2. Only used if robust is TRUE and cv is FALSE. See details.

verbose

a boolean specifying whether to print runtime updates to the console. Default is TRUE.

Arguments passed to the farm.FDR function.

Value

An object with S3 class farm.test containing:

means

estimated means

stderr

estimated standard errors

pvalue

unadjusted p values

rejected

the indices of rejected hypotheses, along with their corresponding p values, and adjusted p values, ordered from most significant to least significant

alldata

all the indices of the tested hypotheses, along with their corresponding p values, adjusted p values, and a column with 1 if declared siginificant and 0 if not

loadings

estimated factor loadings

nfactors

the number of (estimated) factors

significant

the number of means that are found significant

further arguments passed to methods. For complete list use the function names on the output object

Details

alternative = "greater" is the alternative that X has a larger mean than Y.

If some of the underlying factors are known but it is suspected that there are more confounding factors that are unobserved: Suppose we have data \(X = \mu + Bf + Cg + u\), where \(f\) is observed and \(g\) is unobserved. In the first step, the user passes the data \(\{X,f\}\) into the main function. From the output, let us construct the residuals: \(Xres = X - Bf\). Now pass \(Xres\) into the main function, without any factors. The output in this step is the final answer to the testing problem.

For two-sample test, the output values means, stderr, n, nfactors,loadings are all lists containing two items, each pertaining to X and Y, indicated by a prefix X. and Y. respectively.

Number of rows and columns of the data matrix must be at least 4 in order to be able to calculate latent factors.

For details about multiple comparison correction, see farm.FDR.

The tuning parameter = tau * sigma * optimal rate where optimal rate is the optimal rate for the tuning parameter. For details, see Fan et al.(2017). sigma is the standard deviation of the data.

References

Huber, P.J. (1964). "Robust Estimation of a Location Parameter." The Annals of Mathematical Statistics, 35, 73<U+2013>101.

Fan, J., Ke, Y., Sun, Q. and Zhou, W-X. (2017). "FARM-Test: Factor-Adjusted Robust Multiple Testing with False Discovery Control", https://arxiv.org/abs/1711.05386.

Zhou, W-X., Bose, K., Fan, J. and Liu, H. (2017). "A New Perspective on Robust M-Estimation: Finite Sample Theory and Applications to Dependence-Adjusted Multiple Testing," Annals of Statistics, to appear, https://arxiv.org/abs/1711.05381.

See Also

farm.FDR, print.farm.test

Examples

Run this code
# NOT RUN {
set.seed(100)
p = 100
n = 50
epsilon = matrix(rnorm( p*n, 0,1), nrow = n)
B = matrix(runif(p*3,-2,2), nrow=p)
fx = matrix(rnorm(3*n, 0,1), nrow = n)
mu = rep(0, p)
mu[1:5] = 2
X = rep(1,n)%*%t(mu)+fx%*%t(B)+ epsilon
output = farm.test(X, cv=FALSE)#robust, no cross-validation
output

#other robustification options
output = farm.test(X, robust = FALSE, verbose=FALSE) #non-robust
output = farm.test(X, tau = 3, cv=FALSE, verbose=FALSE) #robust, no cross-validation, specified tau
#output = farm.test(X) #robust, cross-validation, longer running

#two sample test
n2 = 25
epsilon = matrix(rnorm( p*n2, 0,1), nrow = n2)
B = matrix(rnorm(p*3,0,1), nrow=p)
fy = matrix(rnorm(3*n2, 0,1), nrow = n2)
Y = fy%*%t(B)+ epsilon
output = farm.test(X=X,Y=Y, robust=FALSE)
output = farm.test(X=X,Y=Y,Kx=0, cv = FALSE) #non-robust
names(output$means)

# }

Run the code above in your browser using DataLab