trim.constraints: Trimmed Constraint Matrices

Description

Deletes statistically nonsignficant regression coefficients via their constraint matrices, for future refitting.

Usage

trim.constraints(object, sig.level = 0.05, max.num = Inf,
                 intercepts = TRUE, ...)

Value

A list of possibly simpler constraint matrices that can be fed back into the model using the

constraints argument (usually zero = NULL is needed to avoid a warning). Consequently, they are required to be of the "term"-type. After the model is refitted, applying

summaryvglm should result in regression coefficients that are `all' statistically significant.

Arguments

object: Some VGAM object, especially having class vglmff-class. It has not yet been tested on non-"vglm" objects.
sig.level: Significance levels, with values in \([0, 1]\). Columns of constraint matices whose p-values are larger than this argument are deleted. With terms that generate more than one column of the "lm" model matrix, all p-values must be greater than this argument for deletion. This argument is recycled to the total number of regression coefficients of object.
max.num: Numeric, positive and integer-valued. Maximum number of regression coefficients allowable for deletion. This allows one to limit the number of deleted coefficients. For example, if max.num = 1 then only the largest p-value is used for the deletion, provided it is larger than sig.level. The default is to delete all those coefficients whose p-values are greater than sig.level. With a finite value, this argument will probably not work properly when there are terms that generate more than one column of the LM model matrix. Having a value greater than unity might be unsuitable in the presence of multicollinearity because all correlated variables might be eliminated at once.
intercepts: Logical. Trim the intercept term? If FALSE then the constraint matrix for the "(Intercept)" term is left unchanged.
...: Unused but for provision in the future.

Author

T. W. Yee

Warning

This function has not been tested thoroughly. One extreme is that a term is totally deleted because none of its regression coefficients are needed, and that situation has not yet been finalized. Ideally, object only contains terms where at least one regression coefficient has a p-value less than sig.level. For ordered factors and other situations, deleting certain columns may not make sense and destroy interpretability.

As stated above, max.num may not work properly when there are terms that generate more than one column of the LM model matrix. However, this limitation may change in the future.

Details

This utility function is intended to simplify an existing vglm object having variables (terms) that affect unnecessary parameters. Suppose the explanatory variables in the formula includes a simple numeric covariate called x2. This variable will affect every linear predictor if zero = NULL in the VGAM family function. This situation may correspond to the constraint matrices having unnecessary columns because their regression coefficients are statistically nonsignificant. This function attempts to delete those columns and return a possibly simplified list of constraint matrices that can make refitting a simpler model easy to do. P-values obtained from summaryvglm (with HDEtest = FALSE for increased speed) are compared to sig.level to test for statistical significance.

For terms that generate more than one column of the "lm" model matrix, such as bs and poly, the column is deleted if all regression coefficients are statistically nonsignificant. Incidentally, users should instead use sm.bs, sm.ns, sm.poly, etc., for smart and safe prediction.

One can think of this function as facilitating backward elimination for variable selection, especially if max.num = 1 and \(M=1\), however usually more than one regression coefficient is deleted here by default.

Examples

Run this code

if (FALSE)  data("xs.nz", package = "VGAMdata")
fit1 <-
  vglm(cbind(worry, worrier) ~ bs(age) + sex + ethnicity + cat + dog,
       binom2.or(zero = NULL), data = xs.nz, trace = TRUE)
summary(fit1, HDEtest = FALSE)  # 'cat' is not significant at all
dim(constraints(fit1, matrix = TRUE))
(tclist1 <- trim.constraints(fit1))  # No 'cat'
fit2 <-  # Delete 'cat' manually from the formula:
  vglm(cbind(worry, worrier) ~ bs(age) + sex + ethnicity +       dog,
       binom2.or(zero = NULL), data = xs.nz,
       constraints = tclist1, trace = TRUE)
summary(fit2, HDEtest = FALSE)  # A simplified model
dim(constraints(fit2, matrix = TRUE))  # Fewer regression coefficients

Run the code above in your browser using DataLab