Learn R Programming

artemis (version 1.1.1)

sim_eDNA_lm: Simulate eDNA data

Description

Simulate eDNA data

Usage

sim_eDNA_lm(
  formula,
  variable_list,
  betas,
  sigma_ln_eDNA,
  std_curve_alpha,
  std_curve_beta,
  n_sim = 1L,
  upper_Cq = 40,
  prob_zero = 0.08,
  X = expand.grid(variable_list),
  verbose = FALSE
)

sim_eDNA_lmer( formula, variable_list, betas, sigma_ln_eDNA, sigma_rand, std_curve_alpha, std_curve_beta, n_sim = 1L, upper_Cq = 40, prob_zero = 0.08, X = expand.grid(variable_list), verbose = FALSE )

Arguments

formula

a model formula, e.g. y ~ x1 + x2. For sim_eDNA_lmer, random intercepts can also be provided, e.g. ( 1 | rep ) .

variable_list

a named list, with the levels that each variable can take. Please note that the variables listed in the formula, including the response variable, must be present in the variable_list or in the X design matrix. Extra variables, i.e. variables which do not occur in the formula, are ignored.

betas

numeric vector, the beta for each variable in the design matrix

sigma_ln_eDNA

numeric, the measurement error on ln[eDNA].

std_curve_alpha

the alpha value for the formula for converting between log(eDNA concentration) and CQ value

std_curve_beta

the beta value for the formula for converting between log(eDNA concentration) and CQ value

n_sim

integer, the number of cases to simulate

upper_Cq

numeric, the upper limit on CQ detection. Any value of log(concentration) which would result in a value greater than this limit is instead recorded as the limit.

prob_zero

numeric, between 0 and 1. The probability of seeing a non-detection (i.e., a "zero") via the zero-inflated mechanism. Defaults to 0.08.

X

optional, a design matrix. By default, this is created from the variable_list using expand.grid(), which creates a balanced design matrix. However, the user can provide their own X as well, in which case the variable_list is ignored. This allows users to provide an unbalanced design matrix.

verbose

logical, when TRUE output from rstan::sampling is written to the console.

sigma_rand

numeric vector, the stdev for the random effects. There must be one sigma per random effect specified

Value

S4 object of class "eDNA_simulation_lm/lmer" with the following slots:

ln_conc matrix

the simulated log(concentration)

Cq_star matrix

the simulated CQ values, including the measurement error

formula

the formula for the simulation

variable_levels

named list, the variable levels used for the simulation

betas

numeric vector, the betas for the simulation

x

data.frame, the design matrix

std_curve_alpha numeric

the alpha for the std curve conversion

std_curve_beta numeric

the alpha for the std curve conversion

upper_Cq

the upper limit for CQ

Diagnosing "unrealistic" simulations

Users will find that sometimes the simulationed response (i.e. Cq values) produced by this function are not similar to expected data collected from a sampling experiment. This circumstance suggests that there is a mismatch between the assumptions of the model and the data generating process in the field. For these circumstances, we suggest:

  1. Check that the betas provided are the effect sizes on the predictor on the log[eDNA concentration], and not the Cq values.

  2. Check that the variable levels provided are representative of real-world circumstances. For example, a sample volume of 0 ml is not possible.

  3. Verify the values for the standard curve alpha and beta. These are specific to each calibration for the lab, so it is important that you use the same conversion between Cq values and log[eDNA concentration] as the comparison data.

Details

These functions allow for computationally efficient simulation of Cq values from a hypothetical eDNA sampling experiment via a series of effect sizes (betas) on a number of predictor or variable levels (variable_levels). The mechanism for this model is described in detail in the artemis "Getting Started" vignette.

The simulation functions call to specialized functions which are written in Stan and are compiled to provide speed. This also allows the simulation functions and the modeling functions to reflect the same process at the code level.

Examples

Run this code
# NOT RUN {
## Includes extra variables
vars = list(Intercept = -10.6,
            distance = c(0, 15, 50),
            volume = c(25, 50),
            biomass = 100,
            alive = 1,
            tech_rep = 1:10,
            rep = 1:3, Cq = 1)

## Intercept only
ans = sim_eDNA_lm(Cq ~ 1, vars,
                      betas = c(intercept = -15),
                      sigma_ln_eDNA = 1e-5,
                      std_curve_alpha = 21.2, std_curve_beta = -1.5)

print(ans)

ans = sim_eDNA_lm(Cq ~ distance + volume, vars,
                  betas = c(intercept = -10.6, distance = -0.05, volume = 0.1),
                  sigma_ln_eDNA = 1, std_curve_alpha = 21.2, std_curve_beta = -1.5)
# }

Run the code above in your browser using DataLab