selectEV: Select parsimonious set of explanatory variables.

Description

selectEV selects the parsimonious set of explanatory variables (EVs) which best explains variation in a given response variable (RV). Each EV can be represented by 1 or more derived variables (see deriveVars and selectDVforEV). The function uses a process of forward selection based on comparison of nested models using inference tests. An EV is selected for inclusion when, during nested model comparison, it accounts for a significant amount of remaining variation, under the alpha value specified by the user. See Halvorsen et al. (2015) for a more detailed explanation of the forward selection procedure.

Usage

selectEV(
  dvdata,
  alpha = 0.01,
  retest = FALSE,
  interaction = FALSE,
  formula = NULL,
  test = "Chisq",
  algorithm = "maxent",
  write = FALSE,
  dir = NULL,
  quiet = FALSE
)

Value

List of 3:

dvdata: A list containing first the response variable, followed by data frames of DVs for each selected EV.
selection: A data frame showing the trail of forward selection of individual EVs (and interaction terms if necessary).
selectedmodel: the selected model under the given alpha value.

Arguments

dvdata: List containing first the response variable, followed by data frames of selected derived variables for a given explanatory variable (e.g. the first item in the list returned by selectDVforEV).
alpha: Alpha-level used in F-test comparison of models. Default is 0.01.
retest: Logical. Test variables (or interaction terms) that do not meet the alpha criterion in a given round in subsequent rounds? Default is FALSE.
interaction: Logical. Allow interaction terms between pairs of EVs? Default is FALSE.
formula: A model formula (in the form y ~ x + ...) specifying a starting point for forward model selection. The independent terms in the formula will be included in the model regardless of explanatory power, and must be represented in dvdata, while the remaining explanatory variables in dvdata are candidates for selection. The first list item in dvdata is still taken as the response variable, regardless of formula. Default is NULL, meaning that forward selection starts with zero selected variables.
test: Character string matching either "Chisq" or "F" to determine which inference test is used in nested model comparison. The Chi-squared test is implemented by stats::anova, while the F-test is implemented as described in Halvorsen (2013, 2015). Default is "Chisq".
algorithm: Character string matching either "maxent" or "LR", which determines the type of model used during forward selection. Default is "maxent".
write: Logical. Write the trail of forward selection to .csv file? Default is FALSE.
dir: Directory for file writing if write = TRUE. Defaults to the working directory.
quiet: Logical. Suppress progress messages from EV-selection?

Details

The F-test available in selectEV is calculated using equation 59 in Halvorsen (2013).

When interaction = TRUE, the forward selection procedure selects a parsimonious group of individual EVs first, and then tests interactions between EVs included in the model afterwards. Therefore, interactions are only explored between terms which are individually explain a significant amount of variation. When interaction = FALSE, interactions are not considered. Practically, interactions between EVs are represented by the products of all combinations of their component DVs (Halvorsen, 2013).

The maximum entropy algorithm ("maxent") --- which is implemented in MIAmaxent as an infinitely-weighted logistic regression with presences added to the background --- is conventionally used with presence-only occurrence data. In contrast, standard logistic regression (algorithm = "LR"), is conventionally used with presence-absence occurrence data.

Explanatory variables should be uniquely named. Underscores ('_') and colons (':') are reserved to denote derived variables and interaction terms respectively, and selectEV will replace these --- along with other special characters --- with periods ('.').

References

Halvorsen, R. (2013). A strict maximum likelihood explanation of MaxEnt, and some implications for distribution modelling. Sommerfeltia, 36, 1-132.

Halvorsen, R., Mazzoni, S., Bryn, A., & Bakkestuen, V. (2015). Opportunities for improved distribution modelling practice via a strict maximum likelihood interpretation of MaxEnt. Ecography, 38(2), 172-183.

Examples

Run this code

if (FALSE) {
# From vignette:
grasslandEVselect <- selectEV(grasslandDVselect$dvdata, alpha = 0.001,
                              interaction = TRUE)
summary(grasslandDVselect$dvdata)
length(grasslandDVselect$dvdata[-1])
summary(grasslandEVselect$dvdata)
length(grasslandEVselect$dvdata[-1])
grasslandEVselect$selectedmodel$formula
}

Run the code above in your browser using DataLab