Fit a specific detection function to off-transect or off-point (radial) distances using maximum likelihood. Distance functions are fitted to individual distance observations, not histogram bin heights, despite plot methods that draw histogram bars.
dfuncEstim(
formula,
detectionData,
siteData,
likelihood = "halfnorm",
pointSurvey = FALSE,
w.lo = units::set_units(0, "m"),
w.hi = NULL,
expansions = 0,
series = "cosine",
x.scl = units::set_units(0, "m"),
g.x.scl = 1,
observer = "both",
warn = TRUE,
transectID = NULL,
pointID = "point",
outputUnits = NULL,
control = RdistanceControls()
)
An object of class 'dfunc'. Objects of class 'dfunc' are lists containing the following components:
The vector of estimated parameter values. Length of this vector for built-in likelihoods is one (for the function's parameter) plus the number of expansion terms plus one if the likelihood is either 'hazrate' or 'uniform' (hazrate and uniform have two parameters).
The variance-covariance matrix for coefficients of the distance function, estimated by the inverse of the Hessian of the fit evaluated at the estimates. There is no guarantee this matrix is positive-definite and should be viewed with caution. Error estimates derived from bootstrapping are generally more reliable.
The maximized value of the log likelihood (more specifically, the minimized value of the negative log likelihood).
The convergence code. This code
is returned by optim
. Values other than 0 indicate suspect
convergence.
The name of the likelihood. This is
the value of the argument likelihood
.
Left-truncation value used during the fit.
Right-truncation value used during the fit.
A data frame of detections within the strip
or circle used in the fit. Column 'dist' contains the
observed distances.
Column 'groupSize' contains group sizes associated with
the values of 'dist'. Group
sizes are only used in abundEstim
. This data frame
contains only distances between w.lo
and w.hi
.
Another component of the returned object, i.e., model.frame
contains all observations in the input data, including those outside the strip.
Either NULL if no covariates are included in the
detection function, or a model.matrix
containing the covariates
used in the fit. Factors in in the model.matrix version have been expanded
into 0-1 indicator variables based on R contrasts in effect at the time
of the call. Only covariates associated with distances inside the strip
or circle are included.
A model.frame
object containing observed distances
(the 'response'), covariates specified in the formula, and group sizes if they
were specified. If specified, the name of the group size column is "offset(-variable-)",
not "groupsize(-variable-)", because internally it is easier to treat group sizes
as an offset in the model. This component is a proper model.frame
and contains
both 'terms' and 'contrasts' attributes.
A vector containing the transect ID column names in detectionData
and siteData
. Transect IDs can be a composite of two or more columns and hence
this component can have length greater than 1.
The number of expansion terms used during estimation.
The type of expansion used during estimation.
The original call of this function.
The input or user requested distance at which the distance function is scaled.
The input
value specifying the
height of the distance function at a distance
of call.x.scl
.
The value of input parameter observer
.
The input observer
parameter is only applicable when
g.x.scl
is a data frame.
The fitted object returned by optim
.
See documentation for optim
.
The names of any factors in formula
.
The input value of pointSurvey
.
This is TRUE if distances are radial from a point. FALSE
if distances are perpendicular off-transect.
The formula specified for the detection function.
A list containing values of the 'control' parameters
set by RdistanceControls
.
The measurement units used for output. All distance measurements are converted to these units internally.
The actual distance at which
the distance function is scaled to some value.
i.e., this is the actual x at
which g(x) = g.x.scl
.
Note that call.x.scl
= x.scl
unless
call.x.scl
== "max", in which case x.scl
is the
distance at which g() is maximized.
The actual height of the distance function
at a distance of x.scl
. Note that g.x.scl
=
call.g.x.scl
unless call.g.x.scl
is a multiple observer data frame, in which case g.x.scl
is the
actual height of the distance function at x.scl
computed
from the multiple observer data frame.
A standard formula object (e.g., dist ~ 1
,
dist ~ covar1 + covar2
). The left-hand side (before ~
)
is the name of the vector containing distances (off-transect or
radial). The right-hand side (after ~
)
contains the names of covariate vectors to fit in the detection
function. Covariates can be either detection level and appear in detectionData
or transect level and appear in siteData
. Regular R scoping
rules apply.
Group Sizes: Non-unity group sizes are specified using groupsize()
in the formula. That is, when group sizes are not all 1, they must
be entered as a column in detectionData
and specified
using groupsize()
as part of formula
. For example,
d ~ habitat + groupsize(number)
specifies that
distances appear in variable d
, one covariate
named habitat
is to be fitted, and column number
contains the number of individuals
associated with each detection. If group sizes are not specified,
all group sizes are assumed to be 1.
A data frame containing detection distances (either perpendicular for line-transect or radial for point-transect designs), with one row per detected object or group. This data frame must contain at least the following information:
Detection Distances: A single column containing
detection distances must be specified on the left-hand
side of formula
. As of Rdistance version 3.0.0,
the detection distances must have measurement units attached.
Attach measurements units to distances using library(units);units()<-
.
For example, library(units)
followed by units(df$dist) <- "m"
or
units(df$dist) <- "ft"
will work. Alternatively,
df$dist <- units::set_units(df$dist, "m")
also works.
Site IDs: The ID of the transect or point
(i.e., the 'site') where each object or group was detected.
The site ID column(s) (see arguments transectID
and
pointID
) must
specify the site (transect or point) so that this
data frame can be merged with siteData
.
In a later release, Rdistance
will allow detection-level
covariates. When that happens, detection-level
covariates will appear in this data frame.
See example data set sparrowDetectionData
.
See also Input data frames below
for information on when detectionData
and
siteData
are required inputs.
A data.frame containing site (transect or point)
IDs and any
site level covariates to include in the detection function.
Every unique surveyed site (transect or point) is represented on
one row of this data set, whether or not targets were sighted
at the site. See arguments transectID
and
pointID
for an explanation of the way in which distance and site
data frames are merged. See
section Relationship between data frames (transect and point ID's)
for additional details.
See Data frame requirements for situations in which
detectionData
only, detectionData
and siteData
, or
neither are required.
String specifying the likelihood to fit. Built-in likelihoods at present are "uniform", "halfnorm", "hazrate", "negexp", and "Gamma". See vignette for a way to use user-define likelihoods.
A logical scalar specifying whether input data come from point-transect surveys (TRUE), or line-transect surveys (FALSE).
Lower or left-truncation limit of the distances in distance data.
This is the minimum possible off-transect distance. Default is 0. If
w.lo
is greater than 0, it must be assigned measurement units
using units(w.lo) <- "<units>"
or
w.lo <- units::set_units(w.lo, "<units>")
.
See examples in the help for set_units
.
Upper or right-truncation limit of the distances
in dist
. This is the maximum off-transect distance that
could be observed. If unspecified (i.e., NULL),
right-truncation is set to the maximum of the observed
distances. If w.hi
is specified, it must have associated
measurement units. Assign measurement units
using units(w.hi) <- "<units>"
or
w.hi <- units::set_units(w.hi, "<units>")
.
See examples in the help for set_units
.
A scalar specifying the number of terms
in series
to compute. Depending on the series,
this could be 0 through 5. The default of 0 equates
to no expansion terms of any type. No expansion terms
are allowed (i.e., expansions
is forced to 0) if
covariates are present in the detection function
(i.e., right-hand side of formula
includes
something other than 1
).
If expansions
> 0, this string
specifies the type of expansion to use. Valid values at
present are 'simple', 'hermite', and 'cosine'.
The x coordinate (a distance) at which to scale the
sightability function to g.x.scl
, or the string "max".
When x.scl
is specified (i.e., not 0 or "max"), it must have measurement
units assigned using either library(units);units(x.scl) <- '<units>'
or x.scl <- units::set_units(x.scl, <units>)
. See
units::valid_udunits()
for valid symbolic units. See
Details for more on
scaling the sightability function.
Height of the distance function at coordinate x.
The distance function
will be scaled so that g(x.scl
) = g.x.scl
.
If g.x.scl
is not
a data frame, it must be a numeric value (vector of length 1)
between 0 and 1.
See Details.
A numeric scalar or text string specifying whether observer 1
or observer 2 or both were full-time observers.
This parameter dictates which set of observations form the denominator
of a double observer system.
If, for example, observer 2 was a data recorder and part-time observer,
or if observer 2 was the pilot, set observer
= 1.
If observer
= 1, observations by observer 1 not seen
by observer 2 are ignored. The estimate of detection in this case is the
ratio of number of targets seen by both observers
to the number seen by both plus the number seen by just observer 2.
If observer = "both", the
computation goes both directions.
A logical scalar specifying whether to issue
an R warning if the estimation did not converge or if one
or more parameter estimates are at their boundaries.
For estimation, warn
should generally be left at
its default value of TRUE
. When computing bootstrap
confidence intervals, setting warn = FALSE
turns off annoying warnings when an iteration does
not converge. Regardless of warn
, after
completion all messages about
convergence and boundary conditions are printed
by print.dfunc
, print.abund
, and
plot.dfunc
.
A character vector naming the transect ID column(s) in
detectionData
and siteData
. If transects are
not identified in columns named 'siteID' (the default for both data frames), you need
to specify which column(s) uniquely identify transects. transectID
can have length
greater than 1, in which case unique transects are identified by the composite columns.
When point-transects are used, this is the
ID of points on a transect. When pointSurvey
=TRUE,
the combination of transectID
and
pointID
specify unique sampling sites.
See Input data frames.
If single points are surveyed,
meaning surveyed points were not grouped into transects, each 'transect' consists
of one point. In this case, set transectID
equal to
the point's ID and set pointID
equal to 1 for all points.
A string giving the symbolic measurment
units that results should be reported in. Any
distance measurement unit in units::valid_udunits()
will work. The strings for common distance symbolic units are:
"m" for meters, "ft" for feet, "cm" for centimeters, "mm" for
millimeters, "mi" for miles, "nmile" for
nautical miles ("nm" is nano meters), "in" for inches,
"yd" for yards, "km" for kilometers, "fathom" for fathoms,
"chains" for chains, and "furlong" for furlongs.
If outputUnits
is unspecified (NULL),
output units are the same as distance measurements units in
data
.
A list containing optimization control parameters such
as the maximum number of iterations, tolerance, the optimizer to use,
etc. See the
RdistanceControls
function for explanation of each value,
the defaults, and the requirements for this list.
See examples below for how to change controls.
Rdistance
accommodates two kinds of transects: continuous and point.
On continuous transects detections can occur at
any point along the route, and these are line-transects.
On point transects detections can only
occur at a series of stops (points), and these are
point-transects.
Transects are the basic sampling unit in both cases.
Columns named in transectID
are
sufficient to specify unique line-transects.
The combination of transectID
and
pointID
specify unique sampling locations along point-transects.
See Input data frames below for more detail.
To save space and to easily specify
sites without detections,
all site ID's, regardless of whether a detection occurred there,
and site level covariates are stored in
the siteData
data frame. Detection distances and group
sizes are measured at the detection level and
are stored in the
detectionData
data frame.
The following explains conditions under which various combinations of the input data frames are required.
Detection data and site data both required:
Both detectionData
and siteData
are required if site level covariates are
specified on the right-hand side of formula
.
Detection level covariates are not currently allowed.
Both detectionData
and
siteData
data frames are required to estimate abundance
later in abundEstim
.
Detection data only required:
detectionData
only is required when
covariates are are not included in the distance function (i.e., the right-hand side of
formula
is "~1" or "~groupsize(groupSize)"). Note that dfuncEstim
does not need to know transect IDs (or group sizes)
in order to estimate a distance function; but, group sizes and
transect IDs are stored and used to estimate abundance
in function abundEstim
. Both the detectionData
and
siteData
data frames are required in abundEstim
.
Neither detection data nor site data required
Neither detectionData
nor siteData
are required if all variables specified in formula
are within the scope of dfuncEstim
(e.g., in the global working
environment) and abundance estimates are not required.
Regular R scoping rules apply when the call
to dfuncEstim
is embedded in a function.
This case is will produce distance functions only.
Abundance cannot later be estimated because transects and transect lengths cannot
be specified outside of a data frame. If abundance will be estimated,
use either case 1 or 2.
The input data frames, detectionData
and siteData
,
must be merge-able on unique sites. For line-transects,
site ID's specify transects or routes and are unique values of
the transectID
column in siteData
. In this case,
the following merge must work:
merge(detectionData,siteData,by=transectID)
.
For point-transects,
site ID's specify individual points and are unique values
of the combination paste(transectID,pointID)
.
In this case, the following merge must work:
merge(detectionData,siteData,by=c(transectID, pointID)
.
By default, transects are unique combinations of the
common variables in the detectionData
and siteData
data frames
if both data frames are specified (i.e., unique values of
intersect(names(detectionData), names(siteData))
). If siteData
is not specified and transectID
is not given, transects are assumed to
be identified in a column named siteID
in detectionData
.
Either way
(i.e., either transectID
= "siteID" or specified as something else),
the column(s) containing transect ID's must be correct here if abundance is to be
estimated later. Routine abundEstim
requires transect ID's for bootstrapping
because it resamples unique values of the composite transect ID column(s). abundEstim
uses the value of transectID
specified here and hence users cannot change transect ID's between
calls to dfuncEstim
and abundEstim
and all transectID
columns
must be present in both data frames even though they may not be used until later.
An error occurs if both detectionData
and siteData
are specified
but no common columns exist. Duplicate transectID
values are not allowed in siteData
but are allowed in detectionData
because multiple detections can occur on a single transect
or at a single site. If the same site is surveyed in multiple years, specify another level of transect ID;
for example, transectID
= c("year","transectID")
.
As of Rdistance
version 3.0.0, measurement units are
require on all distances. This includes off-transect
distances, radial
distances, truncation distances (w.lo
and w.hi
),
transect lengths, and study size area.
In dfuncEstim
, units are required on the following:
detectionData$dist
; w.lo
(unless it is zero);
w.hi
(unless it is NULL);
and x.scl
. In abundEstim
, units are
required on siteData$length
and area
. All units are
1-dimensional except those on area
, which are 2-dimensional.
Requiring units ensures that internal calculations and results
(e.g., ESW and abundance) are correct
and that output units are clear.
Input distances can have variable units. For example,
input distances can be in specified in "m", w.hi
in "in",
and w.lo
in "km". Internally, all distances are
converted to the units specified by outputUnits
(or the units of input distances if
outputUnits
is NULL), and
all output is reported
in units of outputUnits
.
Measurement units can be assigned using
units()<-
after attaching the units
package or with x <- units::set_units(x, "<units>")
.
See units::valid_udunits()
for a list of valid symbolic units.
If measurements are truly unit-less, or measurement units are unknown,
set RdistanceControls(requireUnits = FALSE)
. This suppresses
all unit checks and conversions. Users are on their own
to make sure inputs are scaled correctly and that output units are known.
Buckland, S.T., D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers, and L. Thomas. (2001) Introduction to distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford, UK.
abundEstim
, autoDistSamp
.
Likelihood-specific help files (e.g., halfnorm.like
).
See package vignettes for additional options.
# Load example sparrow data (line transect survey type)
data(sparrowDetectionData)
dfunc <- dfuncEstim(formula = dist ~ 1
, detectionData = sparrowDetectionData)
dfunc
plot(dfunc)
Run the code above in your browser using DataLab