farm.test: Main function performing factor-adjusted robust test for means

Description

This function is used to conduct robust statistical test for means of multivariate data, after adjusting for known or unknown latent factors. It uses the Huber's loss function (Huber (1964)) to robustly estimate data parameters.

Usage

farm.test(X, H0 = NULL, fx = NULL, Kx = NULL, Y = NULL, fy = NULL,
  Ky = NULL, alternative = c("two.sided", "less", "greater"),
  alpha = NULL, verbose = TRUE, ...)

Arguments

a n x p data matrix with each row being a sample. You wish to test a hypothesis for the mean of each column of X.

an optional p x 1 vector of the true value of the means (or difference in means if you are performing a two sample test). The default is the zero.

an optional factor matrix with each column being a factor for X. Same number of rows as X.

a optional number of factors to be estimated for X. Otherwise estimated internally.

an optional data matrix that must have the same number of columns as X. You wish test the equality of means of each columns of X and Y.

an optional factor matrix with each column being a factor for Y. Same number of rows as Y. Only used for a two sample test.

a optional number of factors to be estimated for Y. Otherwise estimated internally.

alternative

an optional character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.

alpha

an optional level for controlling the false discovery rate (in decimals). Default is 0.05. Must be in \((0,1)\).

verbose

a logical indicating whether to print summary of the run to console. Default is TRUE.

…

Arguments passed to the farm.FDR function.

Value

A list with the following items

means

the vector of estimated means

stderr

the p x 1 vector of estimated standard errors

pvalue

the p x 1 vector of unadjusted p values

rejected

the indices of rejected hypotheses, along with their corresponding p values, and adjusted p values, ordered from most significant to least significant

alldata

all the indices of the tested hypotheses, along with their corresponding p values, adjusted p values, and a column with 1 if declared siginificant and 0 if not

loadings

estimated factor loadings

nfactors

if needed, the number of estimated factors

Details

alternative = "greater" is the alternative that X has a larger mean than Y.

If some of the underlying factors are known but it is suspected that there are more confounding factors that are unobserved: Suppose we have data \(X = \mu + Bf + Cg + u\), where \(f\) is observed and \(g\) is unobserved. In the first step, the user passes the data \(\{X,f\}\) into the main function. From the output, let us construct the residuals: \(Xres = X - Bf\). Now pass \(Xres\) into the main function, without any factors. The output in this step is the final answer to the testing problem.

Number of rows and columns of the data matrix must be at least 4 in order to be able to calculate latent factors.

For details about multiple comparison correction, see farm.FDR.

References

Huber, P.J. (1964). "Robust Estimation of a Location Parameter." The Annals of Mathematical Statistics, 35, 73<U+2013>101.

Examples

Run this code

# NOT RUN {
set.seed(100)
p = 20
n = 10
epsilon = matrix(rnorm( p*n, 0,1), nrow = n)
B = matrix(rnorm(p,0,1), nrow=p)
fx = matrix(rnorm(n, 0,1), nrow = n)
mu = rep(0, p)
mu[1:5] = 2
X = rep(1,n)%*%t(mu)+fx%*%t(B)+ epsilon
output1 = farm.test(X)
output = farm.test(X, alpha = 0.01,alternative = "greater")

# }

Run the code above in your browser using DataLab