selectEV
selects the parsimonious set of explanatory variables (EVs)
which best explains variation in a given response variable (RV). Each EV can
be represented by 1 or more derived variables (see deriveVars
and selectDVforEV
). The function uses a process of forward
selection based on comparison of nested models using inference tests. An EV
is selected for inclusion when, during nested model comparison, it accounts
for a significant amount of remaining variation, under the alpha value
specified by the user. See Halvorsen et al. (2015) for a more detailed
explanation of the forward selection procedure.
selectEV(
dvdata,
alpha = 0.01,
retest = FALSE,
interaction = FALSE,
formula = NULL,
test = "Chisq",
algorithm = "maxent",
write = FALSE,
dir = NULL,
quiet = FALSE
)
List of 3:
dvdata: A list containing first the response variable, followed by data frames of DVs for each selected EV.
selection: A data frame showing the trail of forward selection of individual EVs (and interaction terms if necessary).
selectedmodel: the selected model under the given alpha value.
List containing first the response variable, followed by data
frames of selected derived variables for a given explanatory
variable (e.g. the first item in the list returned by
selectDVforEV
).
Alpha-level used in F-test comparison of models. Default is 0.01.
Logical. Test variables (or interaction terms) that do not meet
the alpha criterion in a given round in subsequent rounds? Default is
FALSE
.
Logical. Allow interaction terms between pairs of EVs?
Default is FALSE
.
A model formula (in the form y ~ x + ...) specifying a
starting point for forward model selection. The independent terms in the
formula will be included in the model regardless of explanatory power, and
must be represented in dvdata
, while the remaining explanatory
variables in dvdata
are candidates for selection. The first list
item in dvdata
is still taken as the response variable, regardless
of formula
. Default is NULL
, meaning that forward selection
starts with zero selected variables.
Character string matching either "Chisq" or "F" to determine which inference test is used in nested model comparison. The Chi-squared test is implemented by stats::anova, while the F-test is implemented as described in Halvorsen (2013, 2015). Default is "Chisq".
Character string matching either "maxent" or "LR", which determines the type of model used during forward selection. Default is "maxent".
Logical. Write the trail of forward selection to .csv file?
Default is FALSE
.
Directory for file writing if write = TRUE
. Defaults to the
working directory.
Logical. Suppress progress messages from EV-selection?
The F-test available in selectEV
is calculated using equation 59 in
Halvorsen (2013).
When interaction = TRUE
, the forward selection procedure selects a
parsimonious group of individual EVs first, and then tests interactions
between EVs included in the model afterwards. Therefore, interactions are
only explored between terms which are individually explain a significant
amount of variation. When interaction = FALSE
, interactions are not
considered. Practically, interactions between EVs are represented by the
products of all combinations of their component DVs (Halvorsen, 2013).
The maximum entropy algorithm ("maxent") --- which is implemented in MIAmaxent as an infinitely-weighted logistic regression with presences added to the background --- is conventionally used with presence-only occurrence data. In contrast, standard logistic regression (algorithm = "LR"), is conventionally used with presence-absence occurrence data.
Explanatory variables should be uniquely named. Underscores ('_') and colons
(':') are reserved to denote derived variables and interaction terms
respectively, and selectEV
will replace these --- along with other
special characters --- with periods ('.').
Halvorsen, R. (2013). A strict maximum likelihood explanation of MaxEnt, and some implications for distribution modelling. Sommerfeltia, 36, 1-132.
Halvorsen, R., Mazzoni, S., Bryn, A., & Bakkestuen, V. (2015). Opportunities for improved distribution modelling practice via a strict maximum likelihood interpretation of MaxEnt. Ecography, 38(2), 172-183.
if (FALSE) {
# From vignette:
grasslandEVselect <- selectEV(grasslandDVselect$dvdata, alpha = 0.001,
interaction = TRUE)
summary(grasslandDVselect$dvdata)
length(grasslandDVselect$dvdata[-1])
summary(grasslandEVselect$dvdata)
length(grasslandEVselect$dvdata[-1])
grasslandEVselect$selectedmodel$formula
}
Run the code above in your browser using DataLab