Estimates a smooth detection function for line-transect perpendicular distances or point-transect radial distances.
dfuncSmu(
formula,
detectionData,
siteData,
bw = "SJ-dpi",
adjust = 1,
kernel = "gaussian",
pointSurvey = FALSE,
w.lo = units::set_units(0, "m"),
w.hi = NULL,
x.scl = "max",
g.x.scl = 1,
observer = "both",
warn = TRUE,
transectID = NULL,
pointID = "point",
outputUnits = NULL,
length = "length",
control = RdistanceControls()
)
An object of class 'dfunc'. Objects of class 'dfunc' are lists containing the following components:
A data frame containing the $x and $y
components of the smooth. $x is a vector of length
512 (default for density
) evenly spaced points
between w.lo
and w.hi
.
The value of the log likelihood. Specifically, the sum of the negative log heights of the smooth at observed distances, after the smoothed function has been scaled to integrate to one.
Left-truncation value used during the fit.
Right-truncation value used during the fit.
The input vector of observed distances.
NULL. Covariates are not allowed in the smoothed distance function (yet).
The original call of this function.
The distance at which the distance function
is scaled. This is the x at which g(x) = g.x.scl
.
Normally, call.x.scl
= 0.
The value of the distance function at distance
call.x.scl
. Normally, call.g.x.scl
= 1.
The value of input parameter observer
.
The smoothed object returned by stats::density
. All
information returned by stats::density
is preserved, and
in particular the numeric value of the bandwidth used during the
smooth is returned in fit$bw
The input value of pointSurvey
.
This is TRUE if distances are radial from a point. FALSE
if distances are perpendicular off-transect.
The formula specified for the detection function.
A formula object (e.g., dist ~ 1).
The left-hand side (before ~)
is the name of the vector containing distances (perpendicular or
radial). The right-hand side (after ~)
must be the intercept-only model as Rdistance
does not
currently allow covariates in smoothed distance functions.
If names in formula
do not appear in detectionData
,
the normal scoping rules for model fitting routines (e.g.,
lm
and glm
) apply.
A data frame containing detection distances (either perpendicular for line-transect or radial for point-transect designs), with one row per detected object or group. This data frame must contain at least the following information:
Detection Distances: A single column containing
detection distances must be specified on the left-hand
side of formula
.
Site IDs: The ID of the transect or point
(i.e., the 'site') where each object or group was detected.
The site ID column(s) (see argument siteID
) must
specify the site (transect or point) so that this
data frame can be merged with siteData
.
Optionally, this data frame can contain the following variables:
Group Sizes: The number of individuals in the group
associated with each detection. If unspecified, Rdistance
assumes all detections are of single individuals (i.e.,
all group sizes are 1).
When Rdistance
allows detection-level
covariates in some version after 2.1.1, detection-level
covariates will appear in this data frame.
See example data set sparrowDetectionData
).
See also Input data frames below
for information on when detectionData
and
siteData
are required inputs.
A data.frame containing site (transect or point)
IDs and any
site level covariates to include in the detection function.
Every unique surveyed site (transect or point) is represented on
one row of this data set, whether or not targets were sighted
at the site. See arguments transectID
and
pointID
for an explanation of site and transect ID's.
If sites are transects,
this data frame must also contain transect length. By
default, transect length is assumed to be in column 'length'
but can be specified using argument length
.
The total number of sites surveyed is nrow(siteData)
.
Duplicate site-level IDs are not allowed in siteData
.
See Input data frames
for when detectionData
and siteData
are required inputs.
Bandwidth of the smooth, which controls
smoothness. Smoothing is done by stats::density
, and
bw
is
passed straight to it's bw
argument. bw
can be
numeric, in which case it is the standard deviation of the
Gaussian smoothing kernel. Or, bw
can be
a character string specifying the
bandwidth selection rule. Valid character string values
of bw
are the following:
"nrd0" : Silverman's 'rule-of-thumb' equal to
\(\frac{0.9s}{1.34n^{-0.2}}\), where
\(s\) is the minimum of standard deviation of the distances
and the interquartile range. See bw.nrd0
.
"nrd" : The more common 'rule-of-thumb' variation given by
Scott (1992). This rule uses 1.06 in the denominator of the
"nrd0" bandwidth. See bw.nrd
"bcv" : The biased cross-validation method. See bcv
.
"ucv" : The unbiased cross-validation method. See ucv
.
"SJ" or "SJ-ste" : The 'solve-the-equation' bandwidth of Sheather &
Jones (1991). See bw.SJ
or width.SJ
.
"SJ-dpi" (default) : The 'direct-plug-in' bandwidth of Sheather &
Jones (1991). See bw.SJ
or width.SJ
.
Bandwidth adjustment for the amount of smooth.
Smoothing is done by density
, and
this parameter is
passed straight to it's adjust
argument.
In stats::density
, the bandwidth used is
actually adjust*bw
, and inclusion of this parameters makes
it easier to specify values like 'half the default' bandwidth.
Character string specifying the smoothing kernel function.
This parameters is passed unmodified to stats::density
. Valid
values are:
"gaussian" : Gaussian (normal) kernel, the default
"rectangular" : Uniform or flat kernel
"triangular" : Equilateral triangular kernel
"epanechnikov" : the Epanechnikov kernel
"biweight" : the biweight kernel
"cosine" : the S version of the cosine kernel
"optcosine" : the optimal cosine kernel which is the usual one reported in the literature
Values of kernel
may be abbreviated to the first letter of
each string. The numeric value of bw
used in the smooth
is stored in the $fit
component of the returned object
(i.e., in returned$fit$bw
).
A logical scalar specifying whether input data come from point-transect surveys (TRUE), or line-transect surveys (FALSE). Point surveys (TRUE) have not been implemented yet.
Lower or left-truncation limit of the distances in distance data. This is the minimum possible off-transect distance. Default is 0.
Upper or right-truncation limit of the distances
in dist
. This is the maximum off-transect distance that
could be observed. If left unspecified (i.e., at the default of
NULL), right-truncation is set to the maximum of the
observed distances.
This parameter is passed to F.gx.estim
.
See F.gx.estim
documentation for definition.
This parameter is passed to F.gx.estim
.
See F.gx.estim
documentation for definition.
This parameter is passed to F.gx.estim
.
See F.gx.estim
documentation for definition.
A logical scalar specifying whether to issue
an R warning if the estimation did not converge or if one
or more parameter estimates are at their boundaries.
For estimation, warn
should generally be left at
its default value of TRUE
. When computing bootstrap
confidence intervals, setting warn = FALSE
turns off annoying warnings when an iteration does
not converge. Regardless of warn
, messages about
convergence and boundary conditions are printed
by print.dfunc
, print.abund
, and
plot.dfunc
, so there should be little harm in
setting warn = FALSE
.
A character vector naming the transect ID column(s) in
detectionData
and siteData
. Transects can be the
basic sampling unit (when pointSurvey
=FALSE) or
contain multiple sampling units (e.g., when pointSurvey
=TRUE).
For line-transects, the transectID
column(s) alone is
sufficient to specify unique sample sites.
For point-transects, the amalgamation of transectID
and
pointID
specify unique sampling sites.
See Input data frames.
When point-transects are used, this is the
ID of points on a transect. When pointSurvey
=TRUE,
the amalgamation of transectID
and
pointID
specify unique sampling sites.
See Input data frames.
If single points are surveyed,
meaning surveyed points were not grouped into transects, each 'transect' consists
of one point. In this case, set transectID
equal to
the point's ID and set pointID
equal to 1 for all points.
A string giving the symbolic measurment
units that results should be reported in. Any
distance measurement unit in units::valid_udunits()
will work. The strings for common distance symbolic units are:
"m" for meters, "ft" for feet, "cm" for centimeters, "mm" for
millimeters, "mi" for miles, "nmile" for
nautical miles ("nm" is nano meters), "in" for inches,
"yd" for yards, "km" for kilometers, "fathom" for fathoms,
"chains" for chains, and "furlong" for furlongs.
If outputUnits
is unspecified (NULL),
output units are the same as distance measurements units in
data
.
Character string specifying the (single) column in
siteData
that contains transect length. This is ignored if
pointSurvey
= TRUE.
A list containing optimization control parameters such
as the maximum number of iterations, tolerance, the optimizer to use,
etc. See the
RdistanceControls
function for explanation of each value,
the defaults, and the requirements for this list.
See examples below for how to change controls.
To save space and to easily specify
sites without detections,
all site ID's, regardless whether a detection occurred there,
and site level covariates are stored in
the siteData
data frame. Detection distances and group
sizes are measured at the detection level and
are stored in the
detectionData
data frame.
The following explains conditions under which various combinations of the input data frames are required.
Detection data and site data both required:
Both detectionData
and siteData
are required if site level covariates are
specified on the right-hand side of formula
.
Detection level covariates are not currently allowed.
Detection data only required:
The detectionData
data frame alone can be
specified if no covariates
are included in the distance function (i.e., right-hand side of
formula
is "~1"). Note that this routine (dfuncEstim
)
does not need to know about sites where zero targets were detected, hence
siteData
can be missing when no covariates are involved.
Neither detection data nor site data required
Neither detectionData
nor siteData
are required if all variables specified in formula
are within the scope of this routine (e.g., in the global working
environment). Scoping rules here work the same as for other modeling
routines in R such as lm
and glm
. Like other modeling
routines, it is possible to mix and match the location of variables in
the model. Some variables can be in the .GlobalEnv
while others
are in either detectionData
or siteData
.
The input data frames, detectionData
and siteData
,
must be merge-able on unique sites. For line-transects,
site ID's (i.e., transect ID's) are unique values of
the transectID
column in siteData
. In this case,
the following merge must work:
merge(detectionData,siteData,by=transectID)
.
For point-transects,
site ID's (i.e., point ID's) are unique values
of the combination paste(transectID,pointID)
.
In this case, the following merge must work:
merge(detectionData,siteData,by=c(transectID, pointID)
.
By default,transectID
and pointID
are NULL and
the merge is done on all common columns.
That is, when transectID
is NULL, this routine assumes unique
transects are specified by unique combinations of the
common variables (i.e., unique values of
intersect(names(detectionData), names(siteData))
).
An error occurs if there are no common column names between
detectionData
and siteData
.
Duplicate site IDs are not allowed in siteData
.
If the same site is surveyed in
multiple years, specify another transect ID column (e.g., transectID =
c("year","transectID")
). Duplicate site ID's are allowed in
detectionData
.
To help explain the relationship between data frames, bear in
mind that during bootstrap estimation of variance
in abundEstim
,
unique transects (i.e., unique values of
the transect ID column(s)), not detections or
points, are resampled with replacement.
Distances are reflected about w.lo
before being passed
to density
. Distances exactly equal to w.lo
are not
reflected. Reflection around w.lo
greatly improves
performance of the kernel methods near the w.lo
boundary
where substantial non-zero probability of sighting typically exists.
Buckland, S.T., D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers, and L. Thomas. (2001) Introduction to distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford, UK.
Scott, D. W. (1992) Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley.
Sheather, S. J. and Jones, M. C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society series B, 53, 683-690.
Silverman, B. W. (1986) Density Estimation. London: Chapman and Hall.
abundEstim
, autoDistSamp
,
dfuncEstim
for the parametric version.
# Load example sparrow data (line transect survey type)
data(sparrowDetectionData)
data(sparrowSiteData)
# Compare smoothed and half-normal detection function
dfuncSmu <- dfuncSmu(dist~1, sparrowDetectionData, w.hi=units::set_units(150, "m"))
dfuncHn <- dfuncEstim(formula=dist~1,sparrowDetectionData,w.hi=units::set_units(150, "m"))
# Print and plot results
dfuncSmu
dfuncHn
plot(dfuncSmu,main="",nbins=50)
x <- seq(0,150,length=200)
y <- dnorm(x, 0, predict(dfuncHn)[1])
y <- y/y[1]
lines(x,y, col="orange", lwd=2)
legend("topright", legend=c("Smooth","Halfnorm"),
col=c("red","orange"), lwd=2)
Run the code above in your browser using DataLab