MCgof: Monte Carlo Goodness-of-fit for SECR Models

Description

MCgof implements and extends the Monte Carlo resampling method of Choo et al. (2024) to emulate Bayesian posterior predictive checks (Gelman et al. 1996, Royle et al. 2014). Initial results suggest the approach is more informative than the deviance-based test proposed by Borchers and Efford (2008) and implemented in secr.test. However, the tests have limited power.

MCgof is under development. The structure of the output may change and bugs may be found. See Warning below for exclusions.

Usage

# S3 method for secr
MCgof(object, nsim = 100, statfn = NULL, testfn = NULL, seed = NULL, 
    ncores = 1, clustertype = c("PSOCK", "FORK"), usefxi = TRUE, 
    useMVN = TRUE, Ndist = NULL, quiet = FALSE, ...)
# S3 method for secrlist
MCgof(object, nsim = 100, statfn = NULL, testfn = NULL, seed = NULL, 
    ncores = 1, clustertype = c("PSOCK", "FORK"), usefxi = TRUE, 
    useMVN = TRUE, Ndist = NULL, quiet = FALSE, ...)

Value

Invisibly returns an object of class `MCgof' with components -

nsim: as input
statfn: as input or default
testfn: as input or default
all: list of outputs: for each statistic, a 3 x nsim matrix. Rows correspond to Tobs, Tsim, and a binary indicator for Tsim > Tobs
proctime: execution time in seconds

For secrlist input the value returned is a list of `MCgof' objects.

Arguments

object: secr fitted model or secrlist object
nsim: integer number of replicates
statfn: function to extract summary statistics from capture histories
testfn: function to compare observed and expected counts
seed: integer seed
ncores: integer for number of parallel cores
clustertype: character cluster type for parallel::makeCluster
usefxi: logical; if FALSE then AC are simulated de novo from the density process rather than using information on the detected individuals
useMVN: logical; if FALSE parameter values are fixed at the MLE rather than drawn from multivariate normal distribution
Ndist: character; distribution of number of unobserved AC (optional)
quiet: logical; if FALSE then a progress bar (ncores=1) and final timing are shown
...: other arguments (not used)

Warning

Not all models are covered and some are untested. These models are specifically excluded -

multi-session models
models with groups
conditional likelihood
polygon, transect, telemetry or signal detectors
non-binary behavioural responses

Notes

This implementation extends the work of Choo et al. (2024) in these respects -

detector types `multi' and `count' are allowed
the model may include variation among detectors
the model may include behavioural responses
2-class finite mixture and hybrid mixture models are both allowed.

Author

Murray Efford and Yan Ru Choo

Details

At each replicate parameter values are sampled from the multivariate-normal sampling distribution of the fitted model. The putative location of each detected individual is drawn from the spatial distribution implied by its observations and the resampled parameters (see fxi); locations of undetected individuals are simulated from the complement of pdot(x) times D(x).

New detections are simulated under the model for individuals at the simulated locations, along with the expected numbers. Detections form a capthist object, a 3-D array with dimensions for individual $i$, occasion $j$ and detector $k$*. Thus for each replicate and detected individual there are the original observations $y_{ijk}$, simulated observations $Y_{ijk}$, and expected counts $\mbox E (y_{ijk})$. Two discrepancy statistics are calculated for each replicate -- observed vs expected counts, and simulated vs expected counts -- and a record is kept of which of these discrepancy statistics is the larger (indicating poorer fit).

* Notation differs slightly from Choo et al. (2024), using $j$ for occasion and $k$ for detector to be consistent with usage in secr and elsewhere (e.g., Borchers and Fewster 2016).

The default discrepancy (testfn) is the Freeman-Tukey statistic as in Choo et al. (2024) and Royle et al. (2014) (see also Brooks, Catchpole and Morgan 2000). The statistic has this general form for $M$ counts $y_m$ with expected value $\mbox E(y_m)$: $$T = \sum_{m=1}^{m=M} \left(\sqrt {y_m} - \sqrt{E(y_m)}\right)^2.$$

The key output of MCgof is the proportion of replicates in which the simulated discrepancy exceeds the observed discrepancy. For perfect fit this will be about 0.5, and for poor fit it will approach zero.

By default, tests are performed separately for three types of count: the numbers of detections of each individual (yi), at each detector (yk), and for each individual at each detector (yik) extracted by the default statfn from the margins of the observed and simulated capture histories.

$y_{ik} = \sum_j y_{ijk}$		individual x detector	$y_{i} = \sum_j \sum_k y_{ijk}$
	individual	$y_{k} = \sum_i \sum_j y_{ijk}$

Parallel processing is offered using multiple cores (CPUs) through the package parallel when ncores > 1. This differs from the usual multithreading paradigm in secr and does not rely on the environment variable set by setNumThreads except that, if ncores = NULL, ncores will be set to the value from setNumThreads. The cluster type "FORK" is available only on Unix-like systems; it can require large amounts of memory, but is generally fast. A small value of ncores>1 may be optimal, especially With cluster type "PSOCK".

`usefxi' and `useMVN' may be used to drop key elements of the Choo et al. (2024) approach - they are provided for demonstration only.

`Ndist' refers to the distribution of the number of unobserved AC, conditional on the expected number $q = D^*A - n$ where $D^*$ is the resampled density, $A$ the mask area, and $n$ the number of detected individuals. By default `Ndist' depends on the distribution component of the `details' argument of the fitted model (``poisson" for Poisson $n$, ``fixed"" for binomial $n$).

The RNGkind of the random number generator is set internally for consistency across platforms.

References

Borchers, D. L. and Efford, M. G. (2008) Spatially explicit maximum likelihood methods for capture--recapture studies. Biometrics 64, 377--385.

Borchers, D. L. and Fewster, R. M. (2016) Spatial capture--recapture models. Statistical Science 31, 219--232.

Brooks, S. P., Catchpole, E. A. and Morgan, B. J. T. (2000) Bayesian animal survival estimation. Statistical Science 15, 357--376.

Choo, Y. R., Sutherland, C. and Johnston, A. (2024) A Monte Carlo resampling framework for implementing goodness-of-fit tests in spatial capture-recapture model Methods in Ecology and Evolution DOI: 10.1111/2041-210X.14386.

Gelman, A., Meng, X.-L., and Stern, H. (1996) Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica 6, 733--807.

Royle, J. A., Chandler, R. B., Sollmann, R. and Gardner, B. (2014) Spatial capture--recapture. Academic Press.

Examples

Run this code


# \donttest{
tmp <- MCgof(secrdemo.0)
summary(tmp)
par(mfrow = c(1,3), pty = 's')
plot(tmp)
# }

Run the code above in your browser using DataLab

\(y_{ik} = \sum_j y_{ijk}\)		individual x detector	\(y_{i} = \sum_j \sum_k y_{ijk}\)
	individual	\(y_{k} = \sum_i \sum_j y_{ijk}\)