Learn R Programming

RRPP (version 2.0.3)

model.comparison: Model Comparisons, in terms of the log-likelihood, covariance trace, or Z-score.

Description

Function calculates either log-likelihoods or traces of covariance matrices for comparison with respect to parameter penalties, or calculates Z-scores from RRPP, which can be profiled across a gradient (predictor).

Usage

model.comparison(
  ...,
  type = c("cov.trace", "logLik", "Z"),
  predictor = NULL,
  tol = NULL,
  pc.no = NULL,
  gls.null = FALSE
)

Value

An object of class model.comparison is a data frame with either log-likelihoods or covariance traces, plus parameter penalties. AIC scores might be include, if applicable

Arguments

...

Any number of lm.rrpp class objects for model fits to be compared.

type

An argument to choose between log-likelihood, covariance trace, or Z results. If Z is chosen, Z-scores are calculated, the same log-likelihoods are calculated as with the log-likelihood type, but also in every RRPP permutation, as describe for the initial mode fits, with choice of null model (below).

predictor

An optional vector that can be used to profile the results based on type across a range of numerical values described by the predictor. A spline will also be fit, which will reveal estimated values of the predictor that yield maximum and minimum values of model comparison metric.

tol

If type = logLik or Z, tol is a tolerance value between 0 and 1, indicating the magnitude below which components should be omitted (if standard deviations of components are less than the eigenvalue of the first component times the tolerance), for calculating the log-likelihood.

pc.no

If type = logLik or Z, an optional value to indicate the number of principal components (maximum rank) to use for calculating the log-likelihood.

gls.null

A logical value indicating whether GLS estimation should be used with the null (intercept) model, for calculating Z scores via RRPP of log-likelihoods. This should be FALSE if comparing different GLS estimations of covariance matrices. It should be TRUE if comparing different model fits with the same GLS-estimated covariance matrix.

Author

Michael Collyer

Details

The function calculates either log-likelihoods or traces of (residual) covariance matrices, plus parameter penalties, to assist in comparative model evaluation or selection. Because high-dimensional data often produce singular or ill-conditioned residual covariance matrices, this function does one of two things: 1) uses the trace of a covariance matrix rather than its determinant; or 2) provides a ridge-regularization (Warton, 2008) of the covariance matrix, only if it is determined that it is ill-conditioned. Regardless of implementation, covariance matrices are projected into a principal component (PC) space of appropriate dimensions.

The parameter penalty is based on that proposed by Bedrick and Tsai (1994), equal to 2(pk + p(p + 1)/2), where p is the appropriate dimension (not number of variables) of the covariance matrix. The parameter, k, is the rank of the model design matrix.

In the case that "logLik" is chosen for the argument, type, AIC scores are calculated. These scores may not perfectly match other packages or software that calculate AIC for multivariate data, if ridge regularization was used (and if other packages require p = the number of data variables). When choosing logLik as the type of comparison, it might be a good idea to adjust the tolerance or number of data principal components. The default (NULL) values will use all data dimensions to calculate log-likelihoods, which might cause problems if the number of variables exceeds the number of observations (producing singular residual covariance matrices). However, one should not reduce data dimensions haphazardly, as this can lead to poor estimates of log-likelihood. Furthermore, using the tolerance argument could result in different numbers of principal components used for each model to calculate log-likelihoods, which might be a concern for comparing models. If both tol and pc.no arguments are used, the solution will use the fewest PCs produced by either argument. Because the trace of a covariance matrix is not sensitive to matrix singularity, no PC adjustment is used for the cov.trace argument.

This function can also calculate Z-scores from RRPP on model log-likelihoods, which can be compared directly or profiled along a gradient (predictor). This might be useful for for comparing generalized least-squares (GLS) models, for example, along a gradient of a parameter used to scale the covariance matrix for GLS estimation. See Collyer et al. 2022 for an example of using RRPP on log-likelihoods with different covariance matrices.

Users can construct their own tables from the results but this function does not attempt to summarize results, as interpreting results requires some arbitrary decisions. The anova function explicitly tests multiple models and can be used for nested model comparisons.

Results can also be plotted using the generic plot function.

Caution: For models with GLS estimation, the number of parameters used to estimate the covariance matrix is not taken into consideration. A generalized information criterion is currently in development.

References

Bedrick, E.J., and C.L. Tsai. 1994. Model selection for multivariate regression in small samples. Biometrics, 226-231.

Warton, D.I., 2008. Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association. 103: 340-349.

Collyer, M.L., E.K. Baken, & D.C. Adams. A standardized effect size for evaluating and comparing the strength of phylogenetic signal. Methods in Ecology and Evolution. 13: 367–382.

Examples

Run this code
if (FALSE) {
data(Pupfish)
Pupfish$logSize <- log(Pupfish$CS)
fit1 <- lm.rrpp(coords ~ logSize, data = Pupfish, iter = 0, 
print.progress = FALSE)
fit2 <- lm.rrpp(coords ~ Pop, data = Pupfish, iter = 0, 
print.progress = FALSE)
fit3 <- lm.rrpp(coords ~ Sex, data = Pupfish, iter = 0, 
print.progress = FALSE)
fit4 <- lm.rrpp(coords ~ logSize + Sex, data = Pupfish, iter = 0, 
print.progress = FALSE)
fit5 <- lm.rrpp(coords ~ logSize + Pop, data = Pupfish, iter = 0, 
print.progress = FALSE)
fit6 <- lm.rrpp(coords ~ logSize + Sex * Pop, data = Pupfish, iter = 0, 
print.progress = FALSE)

modComp1 <- model.comparison(fit1, fit2, fit3, fit4, fit5, 
fit6, type = "cov.trace")
modComp2 <- model.comparison(fit1, fit2, fit3, fit4, fit5, 
fit6, type = "logLik", tol = 0.01)

summary(modComp1)
summary(modComp2)

par(mfcol = c(1,2))
plot(modComp1)
plot(modComp2)

# Comparing fits with covariance matrices
# an example for scaling a phylogenetic covariance matrix with
# the scaling parameter, lambda

data("PlethMorph")
Cov <- PlethMorph$PhyCov
lambda <- seq(0, 1, 0.1)

Cov1 <- scaleCov(Cov, scale. = lambda[1])
Cov2 <- scaleCov(Cov, scale. = lambda[2])
Cov3 <- scaleCov(Cov, scale. = lambda[3])
Cov4 <- scaleCov(Cov, scale. = lambda[4])
Cov5 <- scaleCov(Cov, scale. = lambda[5])
Cov6 <- scaleCov(Cov, scale. = lambda[6])
Cov7 <- scaleCov(Cov, scale. = lambda[7])
Cov8 <- scaleCov(Cov, scale. = lambda[8])
Cov9 <- scaleCov(Cov, scale. = lambda[9])
Cov10 <- scaleCov(Cov, scale. = lambda[10])
Cov11 <- scaleCov(Cov, scale. = lambda[11])


fit1 <- lm.rrpp(SVL ~ 1, data = PlethMorph, Cov = Cov1)
fit2 <- lm.rrpp(SVL ~ 1, data = PlethMorph, Cov = Cov2)
fit3 <- lm.rrpp(SVL ~ 1, data = PlethMorph, Cov = Cov3)
fit4 <- lm.rrpp(SVL ~ 1, data = PlethMorph, Cov = Cov4)
fit5 <- lm.rrpp(SVL ~ 1, data = PlethMorph, Cov = Cov5)
fit6 <- lm.rrpp(SVL ~ 1, data = PlethMorph, Cov = Cov6)
fit7 <- lm.rrpp(SVL ~ 1, data = PlethMorph, Cov = Cov7)
fit8 <- lm.rrpp(SVL ~ 1, data = PlethMorph, Cov = Cov8)
fit9 <- lm.rrpp(SVL ~ 1, data = PlethMorph, Cov = Cov9)
fit10 <- lm.rrpp(SVL ~ 1, data = PlethMorph, Cov = Cov10)
fit11 <- lm.rrpp(SVL ~ 1, data = PlethMorph, Cov = Cov11)

par(mfrow = c(1,1))

MC1 <- model.comparison(fit1, fit2, fit3, fit4, fit5, fit6,
fit7, fit8, fit9, fit10, fit11,
type = "logLik")
MC1
plot(MC1)

MC2 <- model.comparison(fit1, fit2, fit3, fit4, fit5, fit6,
fit7, fit8, fit9, fit10, fit11,
type = "logLik", predictor = lambda)
MC2
plot(MC2)


MC3 <- model.comparison(fit1, fit2, fit3, fit4, fit5, fit6,
fit7, fit8, fit9, fit10, fit11,
type = "Z", predictor = lambda)
MC3
plot(MC3)
}

Run the code above in your browser using DataLab