This function is used to conduct robust statistical test for means of multivariate data, after adjusting for known or unknown latent factors. It uses the Huber's loss function (Huber (1964)) to robustly estimate data parameters.
farm.test(X, H0 = NULL, fx = NULL, Kx = NULL, Y = NULL, fy = NULL,
Ky = NULL, alternative = c("two.sided", "less", "greater"),
alpha = NULL, verbose = TRUE, ...)
a n x p data matrix with each row being a sample.
You wish to test a hypothesis for the mean of each column of X
.
an optional p x 1 vector of the true value of the means (or difference in means if you are performing a two sample test). The default is the zero.
an optional factor matrix with each column being a factor for X
. Same number of rows as X
.
a optional number of factors to be estimated for X
. Otherwise estimated internally.
an optional data matrix that must have the same number of columns as X
. You wish test the equality of means of each columns of X
and Y
.
an optional factor matrix with each column being a factor for Y
. Same number of rows as Y
. Only used for a two sample test.
a optional number of factors to be estimated for Y
. Otherwise estimated internally.
an optional character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.
an optional level for controlling the false discovery rate (in decimals). Default is 0.05. Must be in \((0,1)\).
a logical indicating whether to print summary of the run to console. Default is TRUE.
Arguments passed to the farm.FDR
function.
A list with the following items
the vector of estimated means
the p x 1 vector of estimated standard errors
the p x 1 vector of unadjusted p values
the indices of rejected hypotheses, along with their corresponding p values, and adjusted p values, ordered from most significant to least significant
all the indices of the tested hypotheses, along with their corresponding p values, adjusted p values, and a column with 1 if declared siginificant and 0 if not
estimated factor loadings
if needed, the number of estimated factors
alternative = "greater"
is the alternative that X
has a larger mean than Y
.
If some of the underlying factors are known but it is suspected that there are more confounding factors that are unobserved: Suppose we have data \(X = \mu + Bf + Cg + u\), where \(f\) is observed and \(g\) is unobserved. In the first step, the user passes the data \(\{X,f\}\) into the main function. From the output, let us construct the residuals: \(Xres = X - Bf\). Now pass \(Xres\) into the main function, without any factors. The output in this step is the final answer to the testing problem.
Number of rows and columns of the data matrix must be at least 4 in order to be able to calculate latent factors.
For details about multiple comparison correction, see farm.FDR
.
Huber, P.J. (1964). "Robust Estimation of a Location Parameter." The Annals of Mathematical Statistics, 35, 73<U+2013>101.
# NOT RUN {
set.seed(100)
p = 20
n = 10
epsilon = matrix(rnorm( p*n, 0,1), nrow = n)
B = matrix(rnorm(p,0,1), nrow=p)
fx = matrix(rnorm(n, 0,1), nrow = n)
mu = rep(0, p)
mu[1:5] = 2
X = rep(1,n)%*%t(mu)+fx%*%t(B)+ epsilon
output1 = farm.test(X)
output = farm.test(X, alpha = 0.01,alternative = "greater")
# }
Run the code above in your browser using DataLab