For each explanatory variable (EV), selectDVforEV
selects the
parsimonious set of derived variables (DV) which best explains variation in a
given response variable. The function uses a process of forward selection
based on comparison of nested models using inference tests. A DV is selected
for inclusion when, during nested model comparison, it accounts for a
significant amount of remaining variation, under the alpha value specified by
the user. See Halvorsen et al. (2015) for a more detailed explanation of the
forward selection procedure.
selectDVforEV(
dvdata,
alpha = 0.01,
retest = FALSE,
test = "Chisq",
algorithm = "maxent",
write = FALSE,
dir = NULL,
quiet = FALSE
)
List of 2:
dvdata: A list containing first the
response variable, followed by data frames of selected DVs for each
EV. EVs with zero selected DVs are dropped. This item is recommended as
input for dvdata
in selectEV
.
selection: A list of data frames, where each data frame shows the trail of forward selection of DVs for a given EV.
List containing first the response variable, followed by data
frames of derived variables produced for each explanatory variable (e.g.
the first item in the list returned by deriveVars
).
Alpha-level used for inference testing in nested model comparison. Default is 0.01.
Logical. Test variables that do not meet the alpha criterion
in a given round in subsequent rounds? Default is FALSE
.
Character string matching either "Chisq" or "F" to determine which inference test is used in nested model comparison. The Chi-squared test is implemented by stats::anova, while the F-test is implemented as described in Halvorsen (2013, 2015). Default is "Chisq".
Character string matching either "maxent" or "LR", which determines the type of model used during forward selection. Default is "maxent".
Logical. Write the trail of forward selection for each EV to
.csv file? Default is FALSE
.
Directory for file writing if write = TRUE
. Defaults to the
working directory.
Suppress progress bar?
The F-test available in selectDVforEV
is calculated using equation 59
in Halvorsen (2013).
If using binary-type derived variables from deriveVars
, be
aware that a model including all of these DVs will be considered equal to the
the closest nested model, due to perfect multicollinearity (i.e. the dummy
variable trap).
The maximum entropy algorithm ("maxent") --- which is implemented in MIAmaxent as an infinitely-weighted logistic regression with presences added to the background --- is conventionally used with presence-only occurrence data. In contrast, standard logistic regression (algorithm = "LR"), is conventionally used with presence-absence occurrence data.
Explanatory variables should be uniquely named. Underscores ('_') and colons
(':') are reserved to denote derived variables and interaction terms
respectively, and selectDVforEV
will replace these --- along with
other special characters --- with periods ('.').
Halvorsen, R. (2013). A strict maximum likelihood explanation of MaxEnt, and some implications for distribution modelling. Sommerfeltia, 36, 1-132.
Halvorsen, R., Mazzoni, S., Bryn, A., & Bakkestuen, V. (2015). Opportunities for improved distribution modelling practice via a strict maximum likelihood interpretation of MaxEnt. Ecography, 38(2), 172-183.
toydata_seldvs <- selectDVforEV(toydata_dvs$dvdata, alpha = 0.4)
if (FALSE) {
# From vignette:
grasslandDVselect <- selectDVforEV(grasslandDVs$dvdata, alpha = 0.001)
summary(grasslandDVs$dvdata)
sum(sapply(grasslandDVs$dvdata[-1], length))
summary(grasslandDVselect$dvdata)
sum(sapply(grasslandDVselect$dvdata[-1], length))
grasslandDVselect$selection$terdem
}
Run the code above in your browser using DataLab