autoDistSamp: autoDistSamp - Automated classical distance analysis

Description

Perform automated likelihood, expansion, and series selection for a classic distance sampling analysis. Estimate abundance using the best fitting likelihood, expansion, and series.

Usage

autoDistSamp(
  data,
  formula,
  likelihoods = c("halfnorm", "hazrate", "negexp"),
  w.lo = units::set_units(0, "m"),
  w.hi = NULL,
  expansions = 0:3,
  series = c("cosine"),
  x.scl = w.lo,
  g.x.scl = 1,
  warn = TRUE,
  outputUnits = NULL,
  area = NULL,
  propUnitSurveyed = 1,
  ci = 0.95,
  R = 500,
  plot.bs = FALSE,
  showProgress = TRUE,
  plot = TRUE,
  criterion = "AICc"
)

Value

An Rdistance 'abundance estimate' object, which is a list of class c("abund", "dfunc"), containing all the components of a "dfunc" object (see dfuncEstim), plus the following:

estimates: A tibble containing fitted coefficients in the distance function, density in the area(s) surveyed, abundance on the study area, the number of groups seen between w.lo and w.hi, the number of individuals seen between w.lo and w.hi, study area size, surveyed area, average group size, and average effective detection distance.
B: If confidence intervals were requested, a tibble containing all bootstrap values of coefficients, density, abundance, groups seen, individuals seen, study area size, surveyed area size, average group size, and average effective detection distance. The number of rows is always R, the requested number of bootstrap iterations. If an iteration fails, the corresponding row in B is NA (hence, use 'na.rm = TRUE' when computing summaries). Columns 1 through length(coef(dfunc)) contain bootstrap realizations of the distance function's coefficients.
ci: Confidence level of the confidence intervals

Arguments

data: An RdistDf data frame. RdistDf data frames contain one line per transect and a list-based column. The list-based column contains a data frame with detection information. The detection information data frame on each row contains (at least) distances and group sizes of all targets detected on the transect. Function RdistDf creates RdistDf data frames from separate transect and detection data frames. is.RdistDf checks whether data frames are RdistDf's.
formula: A standard formula object. For example, dist ~ 1, dist ~ covar1 + covar2). The left-hand side (before ~) is the name of the vector containing off-transect or radial detection distances. The right-hand side contains the names of covariate vectors to fit in the detection function, and potentially group sizes. Covariates can be either detection level or transect level and can appear in data or exist in the global working environment. Regular R scoping rules apply.
likelihoods: String vector specifying the likelihoods to fit. See 'likelihood' parameter of dfuncEstim.
w.lo: Lower or left-truncation limit of the distances in distance data. This is the minimum possible off-transect distance. Default is 0. If w.lo is greater than 0, it must be assigned measurement units using units(w.lo) <- "<units>" or w.lo <- units::set_units(w.lo, "<units>"). See examples in the help for set_units.
w.hi: Upper or right-truncation limit of the distances in dist. This is the maximum off-transect distance that could be observed. If unspecified (i.e., NULL), right-truncation is set to the maximum of the observed distances. If w.hi is specified, it must have associated measurement units. Assign measurement units using units(w.hi) <- "<units>" or w.hi <- units::set_units(w.hi, "<units>"). See examples in the help for set_units.
expansions: A scalar specifying the number of terms in series to compute. Depending on the series, this could be 0 through 5. The default of 0 equates to no expansion terms of any type. No expansion terms are allowed (i.e., expansions is forced to 0) if covariates are present in the detection function (i.e., right-hand side of formula includes something other than 1).
series: If expansions > 0, this string specifies the type of expansion to use. Valid values at present are 'simple', 'hermite', and 'cosine'.
x.scl: The x coordinate (a distance) at which the detection function will be scaled. g.x.scl can be a distance or the string "max". When x.scl is specified (i.e., not 0 or "max"), it must have measurement units assigned using either library(units);units(x.scl) <- '<units>' or x.scl <- units::set_units(x.scl, <units>). See units::valid_udunits() for valid symbolic units.
g.x.scl: Height of the distance function at coordinate x. The distance function will be scaled so that g(x.scl) = g.x.scl. If g.x.scl is not a data frame, it must be a numeric value (vector of length 1) between 0 and 1.
warn: A logical scalar specifying whether to issue an R warning if the estimation did not converge or if one or more parameter estimates are at their boundaries. For estimation, warn should generally be left at its default value of TRUE. When computing bootstrap confidence intervals, setting warn = FALSE turns off annoying warnings when an iteration does not converge. Regardless of warn, after completion all messages about convergence and boundary conditions are printed by print.dfunc, print.abund, and plot.dfunc.
outputUnits: A string specifying the symbolic measurement units for results. Valid units are listed in units::valid_udunits(). The strings for common distance symbolic units are: "m" - meters, "ft" - feet, "cm" - centimeters, "mm" - millimeters, "mi" - miles, "nmile" - nautical miles ("nm" is nano meters), "in" - inches, "yd" - yards, "km" - kilometers, "fathom" - fathoms, "chains" - chains, and "furlong" - furlongs. If outputUnits is unspecified (NULL), output units will be the same as those on distances in data.
area: A scalar containing the total area of inference. Usually, this is study area size. If area is NULL (the default), area will be set to 1 square unit of the output units and density estimates will be produced. If area is not NULL, it must have measurement units assigned by the units package. The units on area must be convertible to squared output units. Units on area must be two-dimensional. For example, if output units are "foo", units on area must be convertible to "foo^2" by the units package. Units of "km^2", "cm^2", "ha", "m^2", "acre", "mi^2", and several others are acceptable.
propUnitSurveyed: A scalar or vector of real numbers between 0 and 1. The proportion of the default sampling unit that was surveyed. If both sides of line transects were observed, propUnitSurveyed = 1. If only a single side of line transects were observed, set propUnitSurveyed = 0.5. For point transects, this should be set to the proportion of each circle that was observed. Length must either be 1 or the total number of transects in x.
ci: A scalar indicating the confidence level of confidence intervals. Confidence intervals are computed using a bias corrected bootstrap method. If ci = NULL or ci == NA, confidence intervals are not computed.
R: The number of bootstrap iterations to conduct when ci is not NULL.
plot.bs: A logical scalar indicating whether to plot individual bootstrap iterations.
showProgress: A logical indicating whether to show a text-based progress bar during bootstrapping. Default is TRUE. It is handy to shut off the progress bar if running this within another function. Otherwise, it is handy to see progress of the bootstrap iterations.
plot: Logical scalar specifying whether to plot models during model selection. If TRUE, a histogram with fitted distance function is plotted for every model. The function pauses between each plot and prompts the user for whether they want to continue. To suppress user prompts, set plot = FALSE.
criterion: A string specifying the criterion to use when assessing model fit. The best fitting model, as defined by this routine, has the lowest value of this criterion. This must be one of "AICc" (the default), "AIC", or "BIC". See AIC.dfunc for formulas.

Details

During distance function selection, all combinations of likelihoods, series, and number of expansions is fitted. For example, if likelihoods has 3 elements, series has 2 elements, and expansions has 4 elements, this routine fits a total of 3 (likelihoods) * 2 (series) * 4 (expansions) = 24 models. Default parameters fit 9 detection functions, i.e., all combinations of "halfnorm", "hazrate", and "negexp" likelihoods and 0 through 3 expansions. Other combinations are specified through values of likelihoods, series, and expansions.

Suppress all intermediate output using plot.bs=FALSE, showProgress=FALSE, and plot=FALSE.

The returned abundance estimate object contains an additional component, the fitting table (a list of models fitted and criterion values) in component $fitTable.

Examples

Run this code

# Load example sparrow data (line transect survey type)
data(sparrowDf)

autoDistSamp(data = sparrowDf
           , formula = dist ~ groupsize(groupsize)
           , likelihoods = c("halfnorm","negexp")
           , expansions = 0
           , plot = FALSE
           , ci = NULL
           , area = units::set_units(1, "hectare")
)

if (FALSE) {
autoDistSamp(data = sparrowDf
    , formula = dist ~ 1 + groupsize(groupsize)
    , ci = 0.95
    , area = units::set_units(1, "hectare")
)     
}