selectDVforEV: Select parsimonious sets of derived variables.

Description

For each explanatory variable (EV), selectDVforEV selects the parsimonious set of derived variables (DV) which best explains variation in a given response variable. The function uses a process of forward selection based on comparison of nested models using inference tests. A DV is selected for inclusion when, during nested model comparison, it accounts for a significant amount of remaining variation, under the alpha value specified by the user. See Halvorsen et al. (2015) for a more detailed explanation of the forward selection procedure.

Usage

selectDVforEV(
  dvdata,
  alpha = 0.01,
  retest = FALSE,
  test = "Chisq",
  algorithm = "maxent",
  write = FALSE,
  dir = NULL,
  quiet = FALSE
)

Value

List of 2:

dvdata: A list containing first the response variable, followed by data frames of selected DVs for each EV. EVs with zero selected DVs are dropped. This item is recommended as input for dvdata in selectEV.
selection: A list of data frames, where each data frame shows the trail of forward selection of DVs for a given EV.

Arguments

dvdata: List containing first the response variable, followed by data frames of derived variables produced for each explanatory variable (e.g. the first item in the list returned by deriveVars).
alpha: Alpha-level used for inference testing in nested model comparison. Default is 0.01.
retest: Logical. Test variables that do not meet the alpha criterion in a given round in subsequent rounds? Default is FALSE.
test: Character string matching either "Chisq" or "F" to determine which inference test is used in nested model comparison. The Chi-squared test is implemented by stats::anova, while the F-test is implemented as described in Halvorsen (2013, 2015). Default is "Chisq".
algorithm: Character string matching either "maxent" or "LR", which determines the type of model used during forward selection. Default is "maxent".
write: Logical. Write the trail of forward selection for each EV to .csv file? Default is FALSE.
dir: Directory for file writing if write = TRUE. Defaults to the working directory.
quiet: Suppress progress bar?

Details

The F-test available in selectDVforEV is calculated using equation 59 in Halvorsen (2013).

If using binary-type derived variables from deriveVars, be aware that a model including all of these DVs will be considered equal to the the closest nested model, due to perfect multicollinearity (i.e. the dummy variable trap).

The maximum entropy algorithm ("maxent") --- which is implemented in MIAmaxent as an infinitely-weighted logistic regression with presences added to the background --- is conventionally used with presence-only occurrence data. In contrast, standard logistic regression (algorithm = "LR"), is conventionally used with presence-absence occurrence data.

Explanatory variables should be uniquely named. Underscores ('_') and colons (':') are reserved to denote derived variables and interaction terms respectively, and selectDVforEV will replace these --- along with other special characters --- with periods ('.').

References

Halvorsen, R. (2013). A strict maximum likelihood explanation of MaxEnt, and some implications for distribution modelling. Sommerfeltia, 36, 1-132.

Halvorsen, R., Mazzoni, S., Bryn, A., & Bakkestuen, V. (2015). Opportunities for improved distribution modelling practice via a strict maximum likelihood interpretation of MaxEnt. Ecography, 38(2), 172-183.

Examples

Run this code

toydata_seldvs <- selectDVforEV(toydata_dvs$dvdata, alpha = 0.4)

if (FALSE) {
# From vignette:
grasslandDVselect <- selectDVforEV(grasslandDVs$dvdata, alpha = 0.001)
summary(grasslandDVs$dvdata)
sum(sapply(grasslandDVs$dvdata[-1], length))
summary(grasslandDVselect$dvdata)
sum(sapply(grasslandDVselect$dvdata[-1], length))
grasslandDVselect$selection$terdem
}

Run the code above in your browser using DataLab