mpm_create: General Matrix Projection Model Creation

Description

Function mpm_create() is the core workhorse function that creates all flavors of MPM in lefko3. All other MPM creation functions act as wrappers for this function. As such, this function provides the most general and most detailed control over the MPM creation process.

Usage

mpm_create(
  historical = FALSE,
  stage = TRUE,
  age = FALSE,
  devries = FALSE,
  reduce = FALSE,
  simple = FALSE,
  err_check = FALSE,
  data = NULL,
  year = NULL,
  pop = NULL,
  patch = NULL,
  stageframe = NULL,
  supplement = NULL,
  overwrite = NULL,
  repmatrix = NULL,
  alive = NULL,
  obsst = NULL,
  size = NULL,
  sizeb = NULL,
  sizec = NULL,
  repst = NULL,
  matst = NULL,
  fec = NULL,
  stages = NULL,
  yearcol = NULL,
  popcol = NULL,
  patchcol = NULL,
  indivcol = NULL,
  agecol = NULL,
  censorcol = NULL,
  modelsuite = NULL,
  paramnames = NULL,
  inda = NULL,
  indb = NULL,
  indc = NULL,
  dev_terms = NULL,
  density = NA_real_,
  CDF = TRUE,
  random_inda = FALSE,
  random_indb = FALSE,
  random_indc = FALSE,
  negfec = FALSE,
  exp_tol = 700L,
  theta_tol = 100000000L,
  censor = FALSE,
  censorkeep = NULL,
  start_age = NA_integer_,
  last_age = NA_integer_,
  fecage_min = NA_integer_,
  fecage_max = NA_integer_,
  fectime = 2L,
  fecmod = 1,
  cont = TRUE,
  prebreeding = TRUE,
  stage_NRasRep = FALSE,
  sparse_output = FALSE
)

Value

An object of class lefkoMat. This is a list that holds the matrix projection model and all of its metadata. The structure has the following elements:

A: A list of full projection matrices in order of sorted patches and occasion times. All matrices output in R's matrix class, or in the dgCMatrix class from the Matrix package if sparse.
U: A list of survival transition matrices sorted as in A. All matrices output in R's matrix class, or in the dgCMatrix class from the Matrix package if sparse.
F: A list of fecundity matrices sorted as in A. All matrices output in R's matrix class, or in the dgCMatrix class from the Matrix package if sparse.
hstages: A data frame matrix showing the pairing of ahistorical stages used to create historical stage pairs. Only used in historical MPMs.
agestages: A data frame showing age-stage pairs. Only used in age-by-stage MPMs.
ahstages: A data frame detailing the characteristics of associated ahistorical stages, in the form of a modified stageframe that includes status as an entry stage through reproduction. Used in all stage-based and age-by-stage MPMs.
labels: A data frame giving the population, patch, and year of each matrix in order.
dataqc: A vector showing the numbers of individuals and rows in the vertical dataset used as input.
matrixqc: A short vector describing the number of non-zero elements in U and F matrices, and the number of annual matrices.
modelqc: This is the qc portion of the modelsuite input.
prob_out: An optional element only added if err_check = TRUE. This is a list of vital rate probability matrices, with 7 columns in the order of survival, observation probability, reproduction probability, primary size transition probability, secondary size transition probability, tertiary size transition probability, and probability of juvenile transition to maturity.
allstages: An optional element only added if err_check = TRUE. This is a data frame giving the values used to determine each matrix element capable of being estimated.
data: An optional element only added if err_check = TRUE and a raw MPM is requested. This consists of the original dataset as edited by this function for indexing purposes.

Arguments

historical: A logical value indicating whether to build a historical MPM. Defaults to FALSE.
stage: A logical value indicating whether to build a stage-based MPM. If both stage = TRUE and age = TRUE, then will proceed to build an age-by-stage MPM. Defaults to TRUE.
age: A logical value indicating whether to build an age-based MPM. If both stage = TRUE and age = TRUE, then will proceed to build an age-by-stage MPM. Defaults to FALSE.
devries: A logical value indicating whether to use deVries format for historical MPMs. Defaults to FALSE, in which case historical MPMs are created in Ehrlen format.
reduce: A logical value denoting whether to remove ages, ahistorical stages, or historical stages associated exclusively with zero transitions. These are removed only if the respective row and column sums in ALL matrices estimated equal 0. Defaults to FALSE.
simple: A logical value indicating whether to produce A, U, and F matrices, or only the latter two. Defaults to FALSE, in which case all three are output.
err_check: A logical value indicating whether to append extra information used in matrix calculation within the output list. Defaults to FALSE.
data: A data frame of class hfvdata. Required for all MPMs, except for function-based MPMs in which modelsuite is set to a vrm_input object.
year: A variable corresponding to observation occasion, or a set of such values, given in values associated with the year term used in vital rate model development. Can also equal "all", in which case matrices will be estimated for all occasions. Defaults to "all".
pop: A variable designating which populations will have matrices estimated. Should be set to specific population names, or to "all" if all populations should have matrices estimated. Only used in raw MPMs.
patch: A variable designating which patches or subpopulations will have matrices estimated. Should be set to specific patch names, or to "all" if matrices should be estimated for all patches. Defaults to NULL, in which case patch designations are ignored.
stageframe: An object of class stageframe. These objects are generated by function sf_create(), and include information on the size, observation status, propagule status, reproduction status, immaturity status, maturity status, stage group, size bin widths, and other key characteristics of each ahistorical stage. Not needed for purely age-based MPMs.
supplement: An optional data frame of class lefkoSD that provides supplemental data that should be incorporated into the MPM. Three kinds of data may be integrated this way: transitions to be estimated via the use of proxy transitions, transition overwrites from the literature or supplemental studies, and transition multipliers for survival and fecundity. This data frame should be produced using the supplemental() function. Can be used in place of or in addition to an overwrite table (see overwrite below) and a reproduction matrix (see repmatrix below).
overwrite: An optional data frame developed with the overwrite() function describing transitions to be overwritten either with given values or with other estimated transitions. Note that this function supplements overwrite data provided in supplement.
repmatrix: An optional reproduction matrix. This matrix is composed mostly of 0s, with non-zero entries acting as element identifiers and multipliers for fecundity (with 1 equaling full fecundity). If left blank, and no supplement is provided, then all stages marked as reproductive produce offspring at 1x that of estimated fecundity, and that offspring production will yield the first stage noted as propagule or immature. May be the dimensions of either a historical or an ahistorical matrix. If the latter, then all stages will be used in occasion t-1 for each suggested ahistorical transition. Not used in purely age-based MPMs.
alive: A vector of names of binomial variables corresponding to status as alive (1) or dead (0) in occasions t+1, t, and t-1, respectively. Defaults to c("alive3", "alive2", "alive1") for historical MPMs, and c("alive3", "alive2") for ahistorical MPMs. Only needed for raw MPMs.
obsst: A vector of names of binomial variables corresponding to observation status in occasions t+1, t, and t-1, respectively. Defaults to c("obsstatus3", "obsstatus2", "obsstatus1") for historical MPMs, and c("obsstatus3", "obsstatus2") for ahistorical MPMs. Only needed for raw MPMs.
size: A vector of names of variables coding the primary size variable in occasions t+1, t, and t-1, respectively. Defaults to c("sizea3", "sizea2", "sizea1") for historical MPMs, and c("sizea3", "sizea2") for ahistorical MPMs. Only needed for raw, stage-based MPMs.
sizeb: A vector of names of variables coding the secondary size variable in occasions t+1, t, and t-1, respectively. Defaults to an empty set, assuming that secondary size is not used. Only needed for raw, stage-based MPMs.
sizec: A vector of names of variables coding the tertiary size variable in occasions t+1, t, and t-1, respectively. Defaults to an empty set, assuming that tertiary size is not used. Only needed for raw, stage-based MPMs.
repst: A vector of names of binomial variables corresponding to reproductive status in occasions t+1, t, and t-1, respectively. Defaults to c("repstatus3", "repstatus2", "repstatus1") for historical MPMs, and c("repstatus3", "repstatus2") for ahistorical MPMs. Only needed for raw MPMs.
matst: A vector of names of binomial variables corresponding to maturity status in occasions t+1, t, and t-1, respectively. Defaults to c("matstatus3", "matstatus2", "matstatus1") for historical MPMs, and c("matstatus3", "matstatus2") for ahistorical MPMs. Must be provided if building raw MPMs, and stages is not provided.
fec: A vector of names of variables coding for fecundity in occasions t+1, t, and t-1, respectively. Defaults to c("feca3", "feca2", "feca1") for historical MPMs, and c("feca3", "feca2") for ahistorical MPMs. Only needed for raw, stage-based MPMs.
stages: An optional vector denoting the names of the variables within the main vertical dataset coding for the stages of each individual in occasions t+1 and t, and t-1, if historical. The names of stages in these variables should match those used in the stageframe exactly. If left blank, then rlefko3() will attempt to infer stages by matching values of alive, obsst, size, sizev, sizec, repst, and matst to characteristics noted in the associated stageframe. Only used in raw, stage-based MPMs.
yearcol: The variable name or column number corresponding to occasion t in the dataset. Defaults to "year2". Only needed for raw MPMs.
popcol: The variable name or column number corresponding to the identity of the population. Defaults to "popid" if a value is provided for pop; otherwise empty. Only needed for raw MPMs.
patchcol: The variable name or column number corresponding to patch in the dataset. Defaults to "patchid" if a value is provided for patch; otherwise empty. Only needed for raw MPMs.
indivcol: The variable name or column number coding individual identity. Only needed for raw MPMs.
agecol: The variable name or column corresponding to age in time t. Defaults to "obsage". Only used in raw age-based and age-by-stage MPMs.
censorcol: The variable name or column number denoting the censor status. Only needed in raw MPMs, and only if censor = TRUE.
modelsuite: One of three kinds of lists. The first is a lefkoMod object holding the vital rate models and associated metadata. Alternatively, an object of class vrm_input may be provided. Finally, this argument may simply be a list of models used to parameterize the MPM. In the final scenario, data and paramnames must also be given, and all variable names must match across all objects. If entered, then a function-based MPM will be developed. Otherwise, a raw MPM will be developed. Only used in function-based MPMs.
paramnames: A data frame with three columns, the first describing all terms used in linear modeling, the second (must be called mainparams) giving the general model terms that will be used in matrix creation, and the third showing the equivalent terms used in modeling (must be named modelparams). Function create_pm() can be used to create a skeleton paramnames object, which can then be edited. Only required to build function-based MPMs if modelsuite is neither a lefkoMod object nor a vrm_input object.
inda: Can be a single value to use for individual covariate a in all matrices, a pair of values to use for times t and t-1 in historical matrices, or a vector of such values corresponding to each occasion in the dataset. Defaults to NULL. Only used in function-based MPMs.
indb: Can be a single value to use for individual covariate b in all matrices, a pair of values to use for times t and t-1 in historical matrices, or a vector of such values corresponding to each occasion in the dataset. Defaults to NULL. Only used in function-based MPMs.
indc: Can be a single value to use for individual covariate c in all matrices, a pair of values to use for times t and t-1 in historical matrices, or a vector of such values corresponding to each occasion in the dataset. Defaults to NULL. Only used in function-based MPMs.
dev_terms: A numeric vector of 2 elements in the case of a Leslie MPM, and of 14 elements in all other cases. Consists of scalar additions to the y-intercepts of vital rate linear models used to estimate vital rates in function-based MPMs. Defaults to 0 values for all vital rates.
density: A numeric value indicating density value to use to propagate matrices. Only needed if density is an explanatory term used in one or more vital rate models. Defaults to NA. Only used in function_based MPMs.
CDF: A logical value indicating whether to use the cumulative distribution function to estimate size transition probabilities in function-based MPMs. Defaults to TRUE, and should only be changed to FALSE if approximate probabilities calculated via the midpoint method are preferred.
random_inda: A logical value denoting whether to treat individual covariate a as a random, categorical variable. Otherwise is treated as a fixed, numeric variable. Defaults to FALSE. Only used in function-based MPMs.
random_indb: A logical value denoting whether to treat individual covariate b as a random, categorical variable. Otherwise is treated as a fixed, numeric variable. Defaults to FALSE. Only used in function-based MPMs.
random_indc: A logical value denoting whether to treat individual covariate c as a random, categorical variable. Otherwise is treated as a fixed, numeric variable. Defaults to FALSE. Only used in function-based MPMs.
negfec: A logical value denoting whether fecundity values estimated to be negative should be reset to 0. Defaults to FALSE.
exp_tol: A numeric value used to indicate a maximum value to set exponents to in the core kernel to prevent numerical overflow. Defaults to 700. Only used in function-based MPMs.
theta_tol: A numeric value used to indicate a maximum value to theta as used in the negative binomial probability density kernel. Defaults to 100000000, but can be reset to other values during error checking. Only used in function-based MPMs.
censor: If TRUE, then data will be removed according to the variable set in censorcol, such that only data with censor values equal to censorkeep will remain. Defaults to FALSE. Only used in raw MPMs.
censorkeep: The value of the censor variable denoting data elements to keep. Defaults to 0. Only used in raw MPMs.
start_age: The age from which to start the matrix. Defaults to NULL, in which case age 1 is used if prebreeding = TRUE, and age 0 is used if prebreeding = FALSE. Only used in age-based MPMs.
last_age: The final age to use in the matrix. Defaults to NULL, in which case the highest age in the dataset is used. Only used in age-based and age-by-stage MPMs.
fecage_min: The minimum age at which reproduction is possible. Defaults to NULL, which is interpreted to mean that fecundity should be assessed starting in the minimum age observed in the dataset. Only used in age-based MPMs.
fecage_max: The maximum age at which reproduction is possible. Defaults to NULL, which is interpreted to mean that fecundity should be assessed until the final observed age. Only used in age-based MPMs.
fectime: An integer indicating whether to estimate fecundity using the variable given for fec in time t (2) or time t+1 (3). Only used for purely age-based MPMs. Defaults to 2.
fecmod: A scalar multiplier for fecundity. Only used for purely age-based MPMs. Defaults to 1.0.
cont: A logical value designating whether to allow continued survival of individuals past the final age noted in age-based and age-by-stage MPMs, using the demographic characteristics of the final age. Defaults to TRUE.
prebreeding: A logical value indicating whether the life history model is a pre-breeding model. Defaults to TRUE.
stage_NRasRep: A logical value indicating whether to treat non-reproductive individuals as reproductive. Used only in raw, stage-based MPMs in cases where stage assignment must still be handled. Not used in function-based MPMs, and in stage-based MPMs in which a valid hfvdata class data frame with stages already assigned is provided.
sparse_output: A logical value indicating whether to output matrices in sparse format. Defaults to FALSE, in which case all matrices are output in standard matrix format.

General Notes

This function automatically determines whether to create a raw or function-based MPM given inputs supplied by the user.

If used, the reproduction matrix (field repmatrix) may be supplied as either historical or ahistorical. If provided as historical, then a historical MPM must be estimated.

If neither a supplement nor a reproduction matrix are used, and the MPM to create is stage-based, then fecundity will be assumed to occur from all reproductive stages to all propagule and immature stages.

Function-based MPM Notes

Users may at times wish to estimate MPMs using a dataset incorporating multiple patches or subpopulations, but without discriminating between those patches or subpopulations. Should the aim of analysis be a general MPM that does not distinguish these patches or subpopulations, the modelsearch() run should not include patch terms.

Input options including multiple variable names must be entered in the order of variables in occasion t+1, t, and t-1. Rearranging the order will lead to erroneous calculations, and will may lead to fatal errors.

This function provides two different means of estimating the probability of size transition. The midpoint method (CDF = FALSE) refers to the method in which the probability is estimated by first estimating the probability associated with transition from the exact size at the midpoint of the size class using the corresponding probability density function, and then multiplying that value by the bin width of the size class. Doak et al. 2021 (Ecological Monographs) noted that this method can produce biased results, with total size transitions associated with a specific size not totaling to 1.0 and even specific size transition probabilities capable of being estimated at values greater than 1.0. The alternative and default method (CDF = TRUE) uses the cumulative density function to estimate the probability of size transition as the cumulative probability of size transition at the greater limit of the size class minus the cumulative probability of size transition at the lower limit of the size class. This latter method avoids this bias. Note, however, that both methods are exact and unbiased for negative binomial and Poisson distributions.

Under the Gaussian and gamma size distributions, the number of estimated parameters may differ between the two ipm_method settings. Because the midpoint method has a tendency to incorporate upward bias in the estimation of size transition probabilities, it is more likely to yield non- zero values when the true probability is extremely close to 0. This will result in the summary.lefkoMat() function yielding higher numbers of estimated parameters than the ipm_method = "CDF" yields in some cases.

Examples

Run this code

# \donttest{
# Lathyrus historical function-based MPM example
data(lathyrus)

sizevector <- c(0, 4.6, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8,
  9)
stagevector <- c("Sd", "Sdl", "Dorm", "Sz1nr", "Sz2nr", "Sz3nr", "Sz4nr",
  "Sz5nr", "Sz6nr", "Sz7nr", "Sz8nr", "Sz9nr", "Sz1r", "Sz2r", "Sz3r", 
  "Sz4r", "Sz5r", "Sz6r", "Sz7r", "Sz8r", "Sz9r")
repvector <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1)
obsvector <- c(0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
matvector <- c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
immvector <- c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
propvector <- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0)
indataset <- c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
binvec <- c(0, 4.6, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 
  0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5)

lathframeln <- sf_create(sizes = sizevector, stagenames = stagevector, 
  repstatus = repvector, obsstatus = obsvector, matstatus = matvector, 
  immstatus = immvector, indataset = indataset, binhalfwidth = binvec, 
  propstatus = propvector)

lathvertln <- verticalize3(lathyrus, noyears = 4, firstyear = 1988,
  patchidcol = "SUBPLOT", individcol = "GENET", blocksize = 9, 
  juvcol = "Seedling1988", sizeacol = "lnVol88", repstracol = "Intactseed88",
  fecacol = "Intactseed88", deadacol = "Dead1988", 
  nonobsacol = "Dormant1988", stageassign = lathframeln, stagesize = "sizea",
  censorcol = "Missing1988", censorkeep = NA, NAas0 = TRUE, censor = TRUE)

lathvertln$feca2 <- round(lathvertln$feca2)
lathvertln$feca1 <- round(lathvertln$feca1)
lathvertln$feca3 <- round(lathvertln$feca3)

lathmodelsln3 <- modelsearch(lathvertln, historical = TRUE, 
  approach = "mixed", suite = "main", 
  vitalrates = c("surv", "obs", "size", "repst", "fec"), juvestimate = "Sdl",
  bestfit = "AICc&k", sizedist = "gaussian", fecdist = "poisson", 
  indiv = "individ", patch = "patchid", year = "year2", year.as.random = TRUE,
  patch.as.random = TRUE, show.model.tables = TRUE, quiet = "partial")

lathsupp3 <- supplemental(stage3 = c("Sd", "Sd", "Sdl", "Sdl", "mat", "Sd", "Sdl"), 
  stage2 = c("Sd", "Sd", "Sd", "Sd", "Sdl", "rep", "rep"),
  stage1 = c("Sd", "rep", "Sd", "rep", "Sd", "mat", "mat"),
  eststage3 = c(NA, NA, NA, NA, "mat", NA, NA),
  eststage2 = c(NA, NA, NA, NA, "Sdl", NA, NA),
  eststage1 = c(NA, NA, NA, NA, "Sdl", NA, NA),
  givenrate = c(0.345, 0.345, 0.054, 0.054, NA, NA, NA),
  multiplier = c(NA, NA, NA, NA, NA, 0.345, 0.054),
  type = c(1, 1, 1, 1, 1, 3, 3), type_t12 = c(1, 2, 1, 2, 1, 1, 1),
  stageframe = lathframeln, historical = TRUE)

lathmat3ln <- mpm_create(historical = TRUE, year = "all", patch = "all",
  stageframe = lathframeln, modelsuite = lathmodelsln3, data = lathvertln,
  supplement = lathsupp3)
# }

Run the code above in your browser using DataLab