svyvif: Variance inflation factors (VIF) for general linear models fitted with complex survey data

Description

Compute a VIF for fixed effects, general linear regression models fitted with data collected from one- and two-stage complex survey designs.

Usage

svyvif(mobj, X, w, stvar=NULL, clvar=NULL)

Value

A list with two components:

Intercept adjusted: \(p \times 6\) data frame with columns:
No intercept: \(p \times 6\) data frame with columns:

Arguments

mobj: model object produced by svyglm. The following families of models are allowed: binomial, gaussian, poisson, quasibinomial, and quasipoisson. Other families allowed by svyglm will produce an error in svyvif.
X: \(n \times p\) matrix of real-valued covariates used in fitting the regression; \(n\) = number of observations, \(p\) = number of covariates in model, excluding the intercept. A column of 1's for an intercept should not be included. X should not contain columns for the strata and cluster identifiers (unless those variables are part of the model). No missing values are allowed.
w: \(n\)-vector of survey weights used in fitting the model. No missing values are allowed.
stvar: field in mobj that contains the stratum variable in the complex sample design; use stvar = NULL if there are no strata
clvar: field in mobj that contains the cluster variable in the complex sample design; use clvar = NULL if there are no clusters

Author

Richard Valliant

Details

svyvif computes variance inflation factors (VIFs) appropriate for linear models and some general linear models (GLMs) fitted from complex survey data (see Liao 2010 and Liao & Valliant 2012). A VIF measures the inflation of a slope estimate caused by nonorthogonality of the predictors over and above what the variance would be with orthogonality (Theil 1971; Belsley, Kuh, and Welsch 1980). A VIF may also be thought of as the amount that the variance of an estimated coefficient for a predictor x is inflated in a model that includes all x's compared to a model that includes only the single x. Another alternative is to use as a comparison a model that includes an intercept and the single x. Both of these VIFs are in the output.

The standard VIF equals \(1/(1 - R^2_k)\) where \(R_k\) is the multiple correlation of the \(k^{th}\) column of X regressed on the remaining columns. The complex sample value of the VIF for a linear model consists of the standard VIF multiplied by two adjustments denoted in the output as zeta and either varrho.m or varrho. The VIF for a GLM is similar (Liao 2010, chap. 5; Liao & Valliant 2024). There is no widely agreed-upon cutoff value for identifying high values of a VIF, although 10 is a common suggestion.

References

Belsley, D.A., Kuh, E. and Welsch, R.E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: Wiley-Interscience.

Liao, D. (2010). Collinearity Diagnostics for Complex Survey Data. PhD thesis, University of Maryland. http://hdl.handle.net/1903/10881.

Liao, D, and Valliant, R. (2012). Variance inflation factors in the analysis of complex survey data. Survey Methodology, 38, 53-62.

Liao, D, and Valliant, R. (2024). Variance Inflation Factors in Generalized Linear Models with Extensions to Analysis of Survey Data. submitted.

Theil, H. (1971). Principles of Econometrics. New York: John Wiley & Sons, Inc.

Lumley, T. (2010). Complex Surveys. New York: John Wiley & Sons.

Lumley, T. (2023). survey: analysis of complex survey samples. R package version 4.4.

Examples

Run this code

require(survey)
data(nhanes2007)
X1 <- nhanes2007[order(nhanes2007$SDMVSTRA, nhanes2007$SDMVPSU),]
    # eliminate cases with missing values
delete <- which(complete.cases(X1)==FALSE)
X2 <- X1[-delete,]
nhanes.dsgn <- svydesign(ids = ~SDMVPSU,
                         strata = ~SDMVSTRA,
                         weights = ~WTDRD1, nest=TRUE, data=X2)
    # linear model
m1 <- svyglm(BMXWT ~ RIDAGEYR + as.factor(RIDRETH1) + DR1TKCAL
            + DR1TTFAT + DR1TMFAT, design=nhanes.dsgn)
summary(m1)
    # construct X matrix using model.matrix from stats package
X3 <- model.matrix(~ RIDAGEYR + as.factor(RIDRETH1) + DR1TKCAL + DR1TTFAT + DR1TMFAT,
        data = data.frame(X2))
    # remove col of 1's for intercept with X3[,-1]
svyvif(mobj=m1, X=X3[,-1], w = X2$WTDRD1, stvar=NULL, clvar=NULL)

    # Logistic model
X2$obese <- X2$BMXBMI >= 30
nhanes.dsgn <- svydesign(ids = ~SDMVPSU,
                         strata = ~SDMVSTRA,
                         weights = ~WTDRD1, nest=TRUE, data=X2)
m2 <- svyglm(obese ~ RIDAGEYR + as.factor(RIDRETH1) + DR1TKCAL
             + DR1TTFAT + DR1TMFAT, design=nhanes.dsgn, family="quasibinomial")
summary(m2)
svyvif(mobj=m2, X=X3[,-1], w = X2$WTDRD1, stvar = "SDMVSTRA", clvar = "SDMVPSU")

Run the code above in your browser using DataLab