Calculates variance inflation factor (VIF) for a set of variables and exclude the highly correlated variables from the set through a stepwise procedure. This method can be used to deal with multicollinearity problems when you fit statistical models
vif(x, size, ...)
vifcor(x, th = 0.9, keep = NULL, size, method = 'pearson', ...)
vifstep(x, th = 10, keep = NULL, size, method = 'pearson', ...)
an object of class VIF
Numeric explanatory variables (predictors), defined as a raster object (RasterStack
or RasterBrick
or SpatRaster
), or as a matrix
, or as a data.frame
.
a numeric value specifying the correlation threshold for vifcor, and VIF threshold for vifstep (see details).
A character vector with the name of variables that should not be excluded even if they are collinear, e.g., because of ecological reasons
When the data is big, a random sample of the records (cells from raster or rows from data.frame) with the specified size is selected; default is 5000.
a chatacter (one of c("pearson","spearman","kendall")) specifies the method to calculate a pairwise correlation; deafult="pearson".
not implemented.
Babak Naimi naimi.b@gmail.com
VIF can be used to detect collinearity (Strong correlation between two or more predictor variables). Collinearity causes instability in parameter estimation in regression-type models. The VIF is based on the square of the multiple correlation coefficient resulting from regressing a predictor variable against all other predictor variables. If a variable has a strong linear relationship with at least one other variables, the correlation coefficient would be close to 1, and VIF for that variable would be large. A VIF greater than 10 is a signal that the model has a collinearity problem. vif
function calculates this statistic for all variables in x
. vifcor
and vifstep
uses two different strategy to exclude highly collinear variable through a stepwise procedure.
- vifcor
, first finds a pair of variables which has the maximum linear correlation (greater than the threshold; th), and exclude the one with a greater VIF. The procedure is repeated untill no pair of variables with a high corrrelation coefficient (grater than the threshold) remains.
- vifstep
calculates VIF for all variables, excludes the one with the highest VIF (if it is greater than the threshold), repeat the procedure untill no variables with a VIF greater than th
remains.
addtional arguments:
method
default is "pearson", specifies the correlation method (one'pearson','kendall','spearman')
size
a number (default=5000) specifying the maximum number of observations should be contributed in calculation of VIF. When the number of observations (cells in raster or rows in data.frame/matrix) is greater than size
, then a random sample with a size of size
is drawn to keep the calculation effecient.
keep
: sometimes we may have strong biological/ecological justification to keep some variables in the model even if the statistical calculations suggest otherwise. In that case, the keep
argument can help to introduce the name of such variables (or the number specifying which columns in data.frame or which layers in raster object should be kept) to the functions, then the stepwise procedure take them into account to find which variables should be excluded.
Chatterjee, S. and Hadi, A. S. 2006. Regression analysis by example. John Wiley and Sons.;
Dormann, C. F. et al. 2012. Collinearity: A review of methods to Deal with it and a simulation study evaluating their performance. Ecography 35: 001-020.;
--------------
IF you used this method, please cite the following article for which this package is developed:
Naimi, B., Hamm, N.A.S., Groen, T.A., Skidmore, A.K., and Toxopeus, A.G. 2014. Where is positional uncertainty a problem for species distribution modelling?, Ecography 37 (2): 191-203.
exclude
if (FALSE) {
file <- system.file("external/spain.tif", package="usdm")
r <- rast(file) # reading a SpatRaster object including 10 raster layers in Spain
r
vif(r) # calculates vif for the variables in r
v1 <- vifcor(r, th=0.9) # identify collinear variables that should be excluded
v1
v2 <- vifstep(r, th=10) # identify collinear variables that should be excluded
v2
v3 <- vifstep(r, th=10, keep = c('Bio4','Bio10'))
v3
}
Run the code above in your browser using DataLab