durationFreq: Models of Sampling and Extinction for Taxonomic Duration Datasets

Description

These functions construct likelihood models of the observed frequency of taxon durations, given either in discrete (make_durationFreqDisc) or continuous time (make_durationFreqCont). These models can then be constrained using functions available in this package and/or analyzed with commonly used optimizing functions.

Usage

make_durationFreqCont(timeData, groups = NULL, drop.extant = TRUE,
  threshold = 0.01, tol = 1e-04)

make_durationFreqDisc(timeList, groups = NULL, drop.extant = TRUE)

Arguments

timeData

Two-column matrix of per-taxon first and last occurrence given in continuous time, relative to the modern (i.e. older dates are also the 'larger' dates). Unsampled taxa (e.g. from a simulation of sampling in the fossil record, listed as NAs the supplie

groups

Either NULL (the default) or matrix with the number of rows equal to the number of taxa and the number of columns equal to the number of 'systems' of categories for taxa. Taxonomic membership in different groups is indicated by numeric values. For exam

drop.extant

Drops all extant taxa from a dataset, w

threshold

The smallest allowable duration (i.e. the measured difference in the first and last occurrence dates for a given taxon). Durations below this size will be treated as "one-hit" sampling events.

tol

Tolerance level for determining whether a taxon from a continuous-time analysis is extant or not. Taxa which occur at a date less than tol are treated as occurring at the modern day (i.e. being functionally identical as occurring at 0 time).

timeList

A 2 column matrix with the first and last occurrences of taxa given in relative time intervals (i.e. ordered from first to last). If a list of length two is given for timeData, such as would be expected if the output of binTimeData was directly input,

Value

A function of class "paleotreeFunc", which takes a vector equal to the number of parameters and returns the *negative* log likelihood (for use with optim and similar optimizing functions, which attempt to minimize support values). See the functions listed at modelMethods for manipulating and examining such functions and constrainParPaleo for constraining parameters. Parameters in the output functions are named 'q' for the instantaneous per-capita extinction rate, 'r' for the instantaneous per-capita sampling rate and 'R' for the per-interval taxonomic sampling probability. Groupings follow the parameter names, separated by periods; by default, the parameters will be placed in at least group 1 (of a grouping with a single group), such that make_durationFreqCont by default creates a function with parameters named 'q.1' and 'r.1', while make_durationFreqDisc creates a function with parameters named 'q.1' and 'R.1'. Note that the 'q' parameters estimated by make_durationFreqDisc is scaled to per lineage intervals and not to per lineage time-units. If intervals are the same length, this can be easily corrected by multiplying 1 by the interval length. It is unclear how to treat uneven intervals and I urge workers to consider multiple strategies. For translating these sampling probabilities and sampling rates, see SamplingConv.

Details

These functions effectively replace two older functions in paleotree, now removed, getSampRateCont and getSampProbDisc. The functions here do not offer the floating time interval options of their older siblings, but do allow for greater flexibility in defining constrains on parameter values. Differences in time intervals, or any other conceivable discrete differences in parameters, can be modeled using the generic groups argument in these functions. These functions use likelihood functions presented by Foote (1997). These analyses are ideally applied to data from single stratigraphic section but potentially are applicable to regional or global datasets, although the behavior of those datasets is less well understood. As with many functions in the paleotree library, absolute time is always decreasing, i.e. the present day is zero and older dates are 'larger'. On the contrary, relative time is in intervals with non-zero integers that increase sequentially beginning with 1, from earliest to oldest. For make_durationFreqDisc, the intervals in timeList should be non-overlapping sequential intervals of roughly equal length. These should be in relative time as described above, so the earliest interval should be 1 and the numbering should increase as the intervals go up with age. If both previous statements are true, then differences in interval numbers will represent the same rough difference in the absolute timing of those intervals. For example, a dataset where all taxa are listed from a set of sequential intervals of similar length, such as North American Mammal assemblage zones, microfossil faunal zones or graptolite biozones can be given as long as they are correctly numbered in sequential order in the input. As a counter example, a dataset which includes taxa resolved only to intervals as wide as the whole Jurassic and taxa resolved to biozones within the Jurassic should not be included in the same input. Drop taxa from less poorly resolved intervals from such datasets if you want to apply this function, as long as this retains a large enough sample of taxa listed from the sequential set of intervals. Please check that the optimizer function you select actually converges. The likelihood surface can be very flat in some cases, particularly for small datasets (

References

Foote, M. 1997 Estimating Taxonomic Durations and Preservation Probability. Paleobiology 23(3):278--300. Foote, M., and D. M. Raup. 1996 Fossil preservation and the stratigraphic ranges of taxa. Paleobiology 22(2):121--140.

Examples

Run this code

#let's simulate some taxon ranges from an imperfectly sampled fossil record
set.seed(444)
taxa <- simFossilTaxa(p=0.1,q=0.1,nruns=1,mintaxa=20,maxtaxa=30,maxtime=1000,maxExtant=0)
rangesCont <- sampleRanges(taxa,r=0.5)
#bin the ranges into discrete time intervals
rangesDisc <- binTimeData(rangesCont,int.length=1)
#note that we made interval lengths = 1:
 	# thus q (per int) = q (per time) for make_durationFreqDisc

#old ways of doing it
getSampRateCont(rangesCont)
getSampProbDisc(rangesDisc)

#new ways of doing it
    # we can constrain our functions
    # we can use parInit, parLower and parUpper to control parameter bounds

#as opposed to getSampRateCont, we can do:
likFun<-make_durationFreqCont(rangesCont)
optim(parInit(likFun),likFun,lower=parLower(likFun),upper=parUpper(likFun),
      method="L-BFGS-B",control=list(maxit=1000000))

#as opposed to getSampProbDisc, we can do:
likFun<-make_durationFreqDisc(rangesDisc)
optim(parInit(likFun),likFun,lower=parLower(likFun),upper=parUpper(likFun),
     method="L-BFGS-B",control=list(maxit=1000000))

#these give the same answers (as we'd expect them to!)

#with newer functions we can constrain our functions easily
    # what if we knew the extinction rate = 0.1 a priori?
likFun<-make_durationFreqCont(rangesCont)
likFun<-constrainParPaleo(likFun,q.1~0.1)
optim(parInit(likFun),likFun,lower=parLower(likFun),upper=parUpper(likFun),
	    method="L-BFGS-B",control=list(maxit=1000000))

#actually decreases our sampling rate estimate
   # gets further away from true generating value, r = 0.5 (geesh!)
   # but this *is* a small dataset...

Run the code above in your browser using DataLab