betaTest: Test of beta parameters for lm.rrpp model fits

Description

For any lm.rrpp object, a vector of coefficients can be used for a specific test of a vector of betas (specific population parameters). This test follows the form of (b - beta) in the numerator of a t-statistic, where beta can be a value other than 0 (or 0). However, for this test, a vector (Beta) of length, p, is used for the p variables in the lm.rrpp fit. If Beta is a vector of 0s, this test is essentially the same as the test performed for coef.lm.rrpp. However, it is possible to test null hypotheses for beta values other than 0, sensu Cicuéndez et al. (2023).

This function can use either the square-root of the inner-product of vectors of coefficients (distance, d) or generalized inner-product based on the inverse of the residual covariance matrix (Mahalanobis distance, md) as statistics. In most cases, either will likely yield similar (or same) P-values. However, Mahalanobis distance might be preferred for generalized least squares fits, which do not have consistent residual covariance matrices for null (intercept only) models over RRPP permutations (the distances are thus standardized by the residual covariances). If high-dimensional data are analyzed, a generalized inverse of the residual covariance matrix will be used because of singular covariance matrices. Results are less trustworthy with Mahalanobis distances, in these cases.

The coefficient number should be provided for specific tests. One can determine this with, e.g., coef(fit). If it is not provided (NULL), tests will be performed on all possible vectors of coefficients (rows of coefficients matrix). These tests will be performed sequentially. If a null model is not specified, then for each vector of coefficients, the corresponding parameter is dropped from the linear model design matrix to make a null model. This process is analogous in some ways to a leave-one-out cross-validation (LOOCV) analysis, testing each coefficient against models containing parameters for all other coefficients. For example, for a linear model fit, y ~ x1 + x2 + 1, where x1 and x2 are single-parameter covariates, the analysis would first drop the intercept, then x1, then x2, performing three sequential analyses. This option could require large amounts of computation time for large models, high-dimensional data, many RRPP permutations, or any combination of these. The test results previously reported via coef.lm.rrpp can be found using X.null. One would have to be cognizant of the null model used for each coefficient, based on which term it represents. The function, reveal.model.designs could help determine terms to include in a null model. Regardless, such tests have to be performed iteratively now, but do not require verbose results for initial lm.rrpp fits.

Difference between coef.lm.rrpp test and betaTest

The test for coef.lm.rrpp uses the square-root of inner-products of vectors (d) as a test statistic and only tests the null hypothesis that the length of the vector is 0. The significance of the test is based on random values produced by RRPP, based on the matrices of coefficients that are produced in all permutations. The null models for generating RRPP distributions are consistent with those used for ANOVA, as specified in the lm.rrpp fit by choice of SS type. Therefore, the random coefficients are consistent with those produced by RRPP for generating random estimates used in ANOVA.

The betaTest analysis allows different null hypotheses to be used (vector length is not necessarily 0) and unless otherwise specified, uses a null model that lacks one vector of parameters and a full model that contains all vectors of parameters, for the parameter for which coefficients are estimated. This is closest to a type III SS method of estimation, but each parameter is dropped from the model, rather than terms potentially comprising several parameters. Additionally, betaTest calculates Mahalanobis distance, in addition to Euclidean distance, for vectors of coefficients. This statistic is probably better for more types of models (like generalized least squares fits).

High-dimensional data

If data are high-dimensional (more variables than observations), or even highly multivariate, using Mahalanobis distance can require significant computation time and will require using a generalized inverse. One might wish to consider first whether using principal component scores or other ordinate scores could achieve the same goal. (See ordinate.) For example, one could use the first few principal components as a surrogate for a high-dimensional trait, and test whether the surrogate trait is different than Beta. This would require that the PC scores make sense compared to the original variables, but it would be more computationally tractable.

Generalized Least Squares

To the extent that is possible, tests for GLS estimated coefficients should use Mahalanobis distance. The reason is that the covariance matrix for the data (not to be confused with the residual covariance matrix of a linear model) might not be consistent across RRPP permutations. To assure that random distances are comparable in terms of scale, a generalized (Mahalanobis) distance is safer. However, this can impose a computational burden for high-dimensional data (see above).

Usage

betaTest(
  fit,
  X.null = NULL,
  include.md = FALSE,
  coef.no = NULL,
  Beta = NULL,
  print.progress = FALSE
)

Value

Function returns a list with the following components:

obs.d: Length of observed b - Beta vector
obs.md: The observed b - Beta vector length, after accounting for residual covariance matrix; the Mahalanobis distance
Beta: Hypothesized beta values in the Beta vector.
obs.B.mat: The observed matrix of coefficients (before subtracting Beta).
coef.no: The rows of the observed matrix of coefficients, for which to subtract Beta.
random.stats: Random distances produced with RRPP.

Arguments

fit: Object from lm.rrpp
X.null: Optional object that is either a linear model design matrix or a model fit from lm.rrpp, from which a linear model design matrix can be extracted. Note that if any transformation of a design matrix is required (GLS estimation), it is assumed that the matrix was transformed prior to analysis. If X.null is a lm.rrpp object, transformation is inherent.
include.md: A logical vector for whether to include Mahalanobis distances in the results. For highly multivariate data, this will slow down computations, significantly.
coef.no: The row or rows of a matrix of coefficients for which to perform the test. This can be learned by performing coef(fit), prior to the test. If left NULL, the analysis will cycle through every possible vector of coefficients (rows of a coefficients matrix).
Beta: A single value (for univariate data) or a numeric vector with length equal to the number of variables used in the fit object. If left NULL, 0 is used for each parameter. This should not be a matrix. If one wishes to use different Beta vectors for different coefficients, then multiple tests should be performed. (Because tests are performed sequentially, multiple tests using the same Beta vector produces results that are the same as for multiple rows of coefficients, using the same Beta vector.)
print.progress: A logical value for whether to print test progress to the screen. This might be useful if a large number of coefficient vectors are tested, so that one can track completion.

Author

Michael Collyer

References

Tejero-Cicuéndez, H., I. Menéndez, A. Talavera, G. Riaño, B. Burriel-Carranza, M. Simó-Riudalbas, S. Carranza, and D.C. Adams. 2023. Evolution along allometric lines of least resistance: Morphological differentiation in Pristurus geckos. Evolution. 77:2547–2560.

Examples

Run this code

if (FALSE) {
data(PlethMorph)
fit <- lm.rrpp(TailLength ~ SVL, 
data = PlethMorph,
verbose = TRUE)

## Allometry test (Beta = 0)

T1 <- betaTest(fit, coef.no = 2, Beta = 0)
summary(T1)

# Including Mahalanobis distance

T1 <- betaTest(fit, coef.no = 2, 
Beta = 0, include.md = TRUE)
summary(T1)


# compare to
coef(fit, test = TRUE)

# Note that if Beta is not provided

T1 <- betaTest(fit, coef.no = 2)
summary(T1)

# Note that if coef.no is not provided

T1 <- betaTest(fit)
summary(T1)

# Note that if X.null is provided

T1 <- betaTest(fit, X.null = model.matrix(fit)[, 1],
coef.no = 2)
summary(T1)


## Isometry test (Beta = 1)
# Failure to reject H0 suggests isometric-like association.

T2 <- betaTest(fit, coef.no = 2, Beta = 1)
summary(T2)


## More complex tests

# Multiple covariates

fit2 <- lm.rrpp(HeadLength ~ SVL + TailLength, 
data = PlethMorph,
SS.type = "II",
verbose = TRUE)

fit.null1 <- lm.rrpp(HeadLength ~ SVL, 
data = PlethMorph,
verbose = TRUE)

fit.null2 <- lm.rrpp(HeadLength ~ TailLength, 
data = PlethMorph,
verbose = TRUE)

## allometries
T3 <- betaTest(fit2, fit.null2, coef.no = 2, Beta = 0)
T4 <- betaTest(fit2, fit.null1, coef.no = 3, Beta = 0)
summary(T3)
summary(T4)

# compare to
coef(fit2, test = TRUE)

## isometries
T5 <- betaTest(fit2, fit.null2, coef.no = 2, Beta = 1)
T6 <- betaTest(fit2, fit.null1, coef.no = 3, Beta = 1)

summary(T5)
summary(T6) 

# Intercept test
T7 <- betaTest(fit2, fit.null1, coef.no = 1)
summary(T7)

# multivariate data

PlethMorph$Y <- cbind(PlethMorph$HeadLength, PlethMorph$TailLength)
fit3 <- lm.rrpp(Y ~ SVL, 
data = PlethMorph,
verbose = TRUE)

T8 <- betaTest(fit3, coef.no = 2, Beta = c(0, 0))
T9 <- betaTest(fit3, coef.no = 2, Beta = c(1, 1))

summary(T8)
summary(T9)

## GLS example

fit4 <- lm.rrpp(TailLength ~ SVL, 
data = PlethMorph,
Cov = PlethMorph$PhyCov,
verbose = TRUE)

T10 <- betaTest(fit4, include.md = TRUE)

summary(T10)

# compare to
coef(fit4, test = TRUE)

anova(fit4)

}

Run the code above in your browser using DataLab