abundEstim: Estimate abundance from distance-sampling data

Description

Estimate abundance (or density) given an estimated detection function and supplemental information on observed group sizes, transect lengths, area surveyed, etc. Also computes confidence intervals on abundance (or density) using a the bias corrected bootstrap method.

Usage

abundEstim(
  dfunc,
  detectionData,
  siteData,
  area = NULL,
  singleSided = FALSE,
  ci = 0.95,
  R = 500,
  lengthColumn = "length",
  plot.bs = FALSE,
  showProgress = TRUE,
  control = RdistanceControls()
)

Value

An 'abundance estimate' object, which is a list of class c("abund", "dfunc"), containing all the components of a "dfunc" object (see dfuncEstim), plus the following:

density: Estimated density on the sampled area with units. The effectively sampled area is 2*L*ESW (not 2*L*w.hi). Density has squared units of the requested output units. Convert density to other units with units::set_units(x$density, "<units>").
n.hat: Estimated abundance on the study area (if area > 1) or estimated density on the study area (if area = 1), without units.
n: The number of detections (not individuals, unless all group sizes = 1) on non-NA length transects used to compute density and abundance.
n.seen: The total number of individuals seen on transects with non-NA length. Sum of group sizes used to estimate density and abundance.
area: Total area of inference in squared output units.
surveyedUnits: The total length of sampled transect with units. This is the sum of the lengthColumn column of siteData.
avg.group.size: Average group size on transects with non-NA length transects.
rng.group.size: Minimum and maximum groupsizes observed on non-NA length transects.
effDistance: A vector containing effective sample distance. If covariates are not included, length of this vector is 1 because effective sampling distance is constant over detections. If covariates are included, this vector has length equal to the number of detections (i.e., x$n). This vector was produced by a call to effectiveDistance() with newdata set to NULL.
n.hat.ci: A vector containing the lower and upper limits of the bias corrected bootstrap confidence interval for abundance.
density.ci: A vector containing the lower and upper limits of the bias corrected bootstrap confidence interval for density, with units.
effDistance.ci: A vector containing the lower and upper limits of the bias corrected bootstrap confidence interval for average effective sampling distance.
B: A data frame containing bootstrap values of coefficients, density, and effective distances. Number of rows is always R, the requested number of bootstrap iterations. If a particular iteration did not converge, the corresponding row in B is NA (hence, use 'na.rm = TRUE' when computing summaries). Columns 1 through length(coef(dfunc)) contain bootstrap realizations of the distance function's coefficients. The second to last column contains bootstrap values of density (with units). The last column of B contains bootstrap values of effective sampling distance or radius (with units). If the distance function contains covariates, the effective sampling distance column is the average effective distance over detections used during the associated bootstrap iteration.
nItersConverged: The number of bootstrap iterations that converged.
alpha: The (scalar) confidence level of the confidence interval for n.hat.

Arguments

dfunc

An estimated 'dfunc' object produced by dfuncEstim.

detectionData

A data frame containing detection distances (either perpendicular for line-transect or radial for point-transect designs), with one row per detected object or group. This data frame must contain at least the following information:

Detection Distances: A single column containing detection distances must be specified on the left-hand side of formula. As of Rdistance version 3.0.0, the detection distances must have measurement units attached. Attach measurements units to distances using library(units);units()<-. For example, library(units) followed by units(df$dist) <- "m" or units(df$dist) <- "ft" will work. Alternatively, df$dist <- units::set_units(df$dist, "m") also works.
Site IDs: The ID of the transect or point (i.e., the 'site') where each object or group was detected. The site ID column(s) (see arguments transectID and pointID) must specify the site (transect or point) so that this data frame can be merged with siteData.
In a later release, Rdistance will allow detection-level covariates. When that happens, detection-level covariates will appear in this data frame.

See example data set sparrowDetectionData. See also Input data frames below for information on when detectionData and siteData are required inputs.

siteData

A data.frame containing site (transect or point) IDs and any site level covariates to include in the detection function. Every unique surveyed site (transect or point) is represented on one row of this data set, whether or not targets were sighted at the site. See arguments transectID and pointID for an explanation of the way in which distance and site data frames are merged. See section Relationship between data frames (transect and point ID's) for additional details.

See Data frame requirements for situations in which detectionData only, detectionData and siteData, or neither are required.

area

A scalar containing the total area of inference. Commonly, this is study area size. If area is NULL (the default), area will be set to 1 square unit of the output units and this produces abundance estimates equal density estimates. If area is not NULL, it must have measurement units assigned by the units package. The units on area must be convertible to squared output units. Units on area must be two-dimensional. For example, if output units are "foo", units on area must be convertible to "foo^2" by the units package. Units of "km^2", "cm^2", "ha", "m^2", "acre", "mi^2", and many others are acceptable.

singleSided

Logical scaler. If only one side of the transect was observed, set singleSided = TRUE. If both sides of line-transects were observed, singleSided = FALSE. Some surveys observe only one side of transect lines for a variety of logistical reasons. For example, some aerial line-transect surveys place observers on only one side of the aircraft. This parameter effects only line-transects. When singleSided = TRUE, surveyed area is halved and the density estimator's denominator (see Details) is $(ESW)(L)$, not $2(ESW)(L)$.

ci

A scalar indicating the confidence level of confidence intervals. Confidence intervals are computed using a bias corrected bootstrap method. If ci = NULL, confidence intervals are not computed.

R

The number of bootstrap iterations to conduct when ci is not NULL.

lengthColumn

Character string specifying the (single) column in siteData that contains transect lengths. This is ignored if pointSurvey = TRUE. This column must have measurement units.

plot.bs

A logical scalar indicating whether to plot individual bootstrap iterations.

showProgress

A logical indicating whether to show a text-based progress bar during bootstrapping. Default is TRUE. It is handy to shut off the progress bar if running this within another function. Otherwise, it is handy to see progress of the bootstrap iterations.

control

A list containing optimization control parameters such as the maximum number of iterations, tolerance, the optimizer to use, etc. See the RdistanceControls function for explanation of each value, the defaults, and the requirements for this list. See examples below for how to change controls.

Bootstrap Confidence Intervals

The bootstrap confidence interval for abundance assumes that the fundamental units of replication (lines or points, hereafter "sites") are independent. The bias corrected bootstrap method used here resamples the units of replication (sites), refits the distance function, and estimates abundance using the resampled counts and re-estimated distance function. The original data frames, detectionData and siteData, are needed here for bootstrapping because they contain the transect and detection information. If a double-observer data frame is included in dfunc, rows of the double-observer data frame are re-sampled each bootstrap iteration.

This routine does not re-select the distance model fitted to resampled data. The model in the input object is re-fitted every iteration.

By default, R = 500 iterations are performed, after which the bias corrected confidence intervals are computed (Manly, 1997, section 3.4).

During bootstrap iterations, the distance function can fail to converge on the resampled data. An iteration can fail to converge for a two reasons: (1) no detections on the iteration, and (2) bad configuration of distances on the iteration which pushes parameters to their bounds. When an iteration fails to produce a valid distance function, Rdistance simply skips the intration, effectively ignoring these non-convergent iterations. If the proportion of non-convergent iterations is small (less than 20 on abundance is probably valid. If the proportion of non-convergent iterations is not small (exceeds 20 The print method (print.abund) is the routine that issues this warning. The warning can be turned off by setting maxBSFailPropForWarning in the print method to 1.0, or by modifying the code in RdistanceControls() to re-set the default threshold and storing the modified function in your .GlobalEnv. Additional iterations may be needed to achieve an adequate number. Check number of convergent iterations by counting non-NA rows in output data frame 'B'.

Missing Transect Lengths

Line transects: The transect length column of siteData can contain missing values. NA length transects are equivalent to 0 [m] transects and do not count toward total surveyed units. NA length transects are handy if some off-transect distance observations should be included when estimating the distance function, but not when estimating abundance. To do this, include the "extra" distance observations in the detection data frame, with valid site IDs, but set the length of those site IDs to NA in the site data frame. Group sizes associated with NA length transects are dropped and not counted toward density or abundance. Among other things, this allows estimation of abundance on one study area using off-transect distance observations from another.

Point transects: Point transects do not have length. The "length" of point transects is the number of points on the transect. Rdistance treats individual points as independent and bootstrap resampmles them to estimate variance. To include distance obervations from some points but not the number of targets seen, include a separate "length" column in the site data frame with NA for the "extra" points. Like NA length line transects, NA "length" point transects are dropped from the count of points and group sizes on these transects are dropped from the counts of targets. This allows users to estimate their distance function on one set of observations while inflating counts from another set of observations. A transect "length" column is not required for point transects. Values in the lengthColumn do not matter except for NA (e.g., a column of 1's mixed with NA's is acceptable).

Details

The abundance estimate for line-transect surveys (if no covariates are included in the detection function and both sides of the transect were observed) is $$N =\frac{n(A)}{2(ESW)(L)}$$ where n is total number of sighted individuals (i.e., sum(dfunc$detections$groupSizes)), L is the total length of surveyed transect (i.e., sum(siteData[,lengthColumn])), and ESW is effective strip width computed from the estimated distance function (i.e., ESW(dfunc)). If only one side of transects were observed, the "2" in the denominator is not present (or, replaced with a "1").

The abundance estimate for point transect surveys (if no covariates are included) is $$N =\frac{n(A)}{\pi(ESR^2)(P)}$$ where n is total number of sighted individuals, P is the total number of surveyed points, and ESR is effective search radius computed from the estimated distance function (i.e., ESR(dfunc)).

Setting plot.bs=FALSE and showProgress=FALSE suppresses all intermediate output.

References

Manly, B.F.J. (1997) Randomization, bootstrap, and Monte-Carlo methods in biology, London: Chapman and Hall.

Buckland, S.T., D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers, and L. Thomas. (2001) Introduction to distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford, UK.

Examples

Run this code

# Load example sparrow data (line transect survey type)
data(sparrowDetectionData)
data(sparrowSiteData)

# Fit half-normal detection function
dfunc <- dfuncEstim(formula=dist ~ groupsize(groupsize)
                    , detectionData=sparrowDetectionData
                    , likelihood="halfnorm"
                    , w.hi=units::set_units(100, "m")
                    )

# Estimate abundance given a detection function
# No variance on density or abundance estimated here 
# due to time constraints.  Set ci=0.95 (or another value)
# to estimate bootstrap variances on ESW, density, and abundance.

fit <- abundEstim(dfunc
                , detectionData = sparrowDetectionData
                , siteData = sparrowSiteData
                , area = units::set_units(4105, "km^2")
                , ci = NULL
                )

Run the code above in your browser using DataLab