For any lm.rrpp
object, a vector of coefficients
can be used for a specific test of a vector of betas (specific population parameters).
This test follows the form of (b - beta) in the numerator of a t-statistic, where
beta can be a value other than 0 (or 0). However, for this test, a vector (Beta) of length, p,
is used for the p variables in the lm.rrpp
fit. If Beta is a vector of 0s, this
test is essentially the same as the test performed for coef.lm.rrpp
. However,
it is possible to test null hypotheses for beta values other than 0, sensu Cicuéndez et al. (2023).
This function can use either the square-root of the inner-product of vectors of coefficients (distance, d)
or generalized inner-product based on the inverse of the residual covariance matrix
(Mahalanobis distance, md) as statistics. In most cases, either will likely yield similar (or same)
P-values. However, Mahalanobis distance might be preferred for generalized least squares fits, which
do not have consistent residual covariance matrices for null (intercept only) models over
RRPP permutations (the distances are thus
standardized by the residual covariances). If high-dimensional data are analyzed, a generalized inverse
of the residual covariance matrix will be used because of singular covariance matrices. Results are less
trustworthy with Mahalanobis distances, in these cases.
The coefficient number should be provided for specific tests. One can determine this with, e.g.,
coef(fit). If it is not provided (NULL), tests will be performed on all possible vectors of coefficients
(rows of coefficients matrix). These tests will be performed sequentially. If a null model is not specified,
then for each vector of coefficients, the corresponding parameter is dropped from the linear model
design matrix to make a null model.
This process is analogous in some ways to a leave-one-out
cross-validation (LOOCV) analysis, testing each coefficient against models containing parameters for all other
coefficients. For example, for a linear model fit, y ~ x1 + x2 + 1, where x1 and x2 are single-parameter
covariates,
the analysis would first drop the intercept, then x1, then x2, performing three sequential analyses. This
option could require large amounts of computation time for large models, high-dimensional data, many RRPP
permutations, or any combination of these.
The test results previously reported via coef.lm.rrpp can be found using X.null.
One would have to be cognizant of the null model used for each coefficient, based on
which term it represents. The function, reveal.model.designs
could help determine
terms to include in a null model. Regardless, such tests have to be performed iteratively now,
but do not require verbose results for initial lm.rrpp fits.
Difference between coef.lm.rrpp test and betaTest
The test for coef.lm.rrpp uses the square-root of inner-products of vectors (d) as a
test statistic and only tests the null hypothesis that the length of the vector is 0.
The significance of the test is based on random values produced by RRPP, based on the
matrices of coefficients that are produced in all permutations. The null models for generating
RRPP distributions are consistent with those used for ANOVA, as specified in the
lm.rrpp
fit by choice of SS type. Therefore, the random coefficients are
consistent with those produced by RRPP for generating random estimates used in ANOVA.
The betaTest analysis allows different null hypotheses to be used (vector length is not necessarily 0)
and unless otherwise specified, uses a null model that lacks one vector of parameters and a full
model that contains all vectors of parameters, for the parameter for which coefficients are estimated.
This is closest to a type III SS method of estimation, but each parameter is dropped from the model,
rather than terms potentially comprising several parameters. Additionally, betaTest calculates
Mahalanobis distance, in addition to Euclidean distance, for vectors of coefficients.
This statistic is probably
better for more types of models (like generalized least squares fits).
High-dimensional data
If data are high-dimensional (more variables than observations), or even highly multivariate,
using Mahalanobis distance can require significant computation time and will require
using a generalized inverse. One might wish to consider first whether using principal component
scores or other ordinate scores could achieve the same goal. (See ordinate
.)
For example, one could use the first few principal components as a surrogate for a high-dimensional
trait, and test whether the surrogate trait is different than Beta. This would require that
the PC scores make sense compared to the original variables, but it would be more
computationally tractable.
Generalized Least Squares
To the extent that is possible, tests for GLS estimated coefficients should use
Mahalanobis distance. The reason is that the covariance matrix for the data
(not to be confused with the residual covariance matrix of a linear model)
might not be consistent across RRPP permutations. To assure that random distances are
comparable in terms of scale, a generalized (Mahalanobis) distance is safer. However,
this can impose a computational burden for high-dimensional data (see above).