Learn R Programming

BaSTA (version 2.0.0)

DataCheck: Error checking for BaSTA input data.

Description

A function to check the input data file for a Bayesian Survival Trajectory Analysis (BaSTA) for capture-mark-recapture (CMR) or census data.

Usage

DataCheck (object, dataType = "CMR", studyStart = NULL, studyEnd = NULL,  silent = TRUE)

Value

1) CMR data:

newData

The original data frame (for consistency with previous versions of BaSTA).

type1

A vector of row numbers in the original data frame where there are deaths occurring before the study starts.

type2

A vector of row numbers in the original data frame where there are no birth/death AND no obervations.

type3

A vector of row numbers in the original data frame where there are births recorded after death.

type4

A vector of row numbers in the original data frame where there are observations (i.e. recaptures) after death.

type5

A vector of row numbers in the original data frame where there are observations (i.e. recaptures) before birth.

type6

A vector of row numbers in the original data frame where the year of birth is not a zero in the recapture matrix.

summary

List with summary information, e.g., sample size, number of records with known birth, number of records with known death, etc.

stopExec

Logical that indicates if the data are free of errors or not. i.e. TRUE = the data have no apparent errors, and FALSE = there is at leat one error.

probDescr

Character vector explaining the six types of problems the DataCheck functions looks for.

dataType

Type of dataset, i.e., “CMR”.

studyStart

Integer indicating the study start time.

studyEnd

Integer indicating the study end time.

2) census data:

n

Integer for the number of rows (i.e., records) in the dataset.

stopExec

Logical that indicates if the data are free of errors or not. i.e. TRUE = the data have no apparent errors, and FALSE = there is at leat one error.

nas

List organised by column indicating whether NAs were detected in a given column.

DateRan

Matrix of dates ranges (as character strings) for each date column in the dataset.

probDescr

Character vector explaining the seven types of problems the DataCheck functions looks for.

MinBBirth

Vector of indices of rows where “Min.Birth.Date” was larger than “Birth.Date”.

BirthMaxB

Vector of indices of rows where “Birth.Date” was larger than “Max.Birth.Date”.

MinBMaxB

Vector of indices of rows where “Min.Birth.Date” was larger than “Max.Birth.Date”.

BirthEntr

Vector of indices of rows where “Birth.Date” was larger than “Endtry.Date”.

MinBEntr

Vector of indices of rows where “Min.Birth.Date” was larger than “Entry.Date”.

MaxBEntr

Vector of indices of rows where “Max.Birth.Date” was larger than “Entry.Date”.

EntrDep

Vector of indices of rows where “Entry.Date” was larger than “Depart.Date”.

DepartType

Vector of indices of rows where “Depart.Type” does not fall within the “C” (i.e., censored) or “D” (i.e., uncensored or death) categories.

idUnCens

Vector of indices of rows for uncensored (i.e., death) records.

nUnCens

Integer indicating the number of uncensored records.

idCens

Vector of indices of rows for censored records.

nCens

Integer indicating the number of uncensored records.

idNoBirth

Vector of indices of rows for records with uncertain birth date.

nNoBirth

Integer indicating the number of records with uncertain birth date.

Arguments

object

A data.frame to be used as an input data file for BaSTA. Note: BaSTA can take two types of datasets, namely capture-mark-recapture (CMR) or census data.

dataType

A character string indicating if the data are capture-mark-recapture (CMR) or census. Options are “CMR” (default) or “census”.

studyStart

Only required for dataType =CMR”, an integer indicating the first year of the study.

studyEnd

Only required for dataType =CMR”, an integer indicating the last year of the study.

silent

Logical to indicate whether the results should be printed to the console.

Author

Fernando Colchero fernando_colchero@eva.mpg.de

Details

The function checks for inconsistencies in the dataset and reports them back. See value section for details on the types of errors detected by the function.

DATA SPECIFICATIONS:

1) CMR data: The input data object requires the following structure: the first column should be a vector of individual unique IDs, the second and third columns are birth and death years respectively. Columns \(4, \dots, T+3\) represent the observation window (i.e., recapture matrix) of \(T\) years. This is followed (optionally) by columns for categorical and continuous covariates.

2) census data: The input data object requires at least five dates columns, namely “Birth.Date”, “Min.Birth.Date”, “Max.Birth.Date”, “Entry.Date”, and “Depart.Date”. All dates need to be format as “%Y-%m-%d”. In addition, a “Depart.Type” column is required with two types of departures “C” for Censored and “D” for dead.

See Also

FixCMRdata to fix potential issues for capture-mark-recapture data.

Examples

Run this code
## CMR data:
## --------- #
## Load data:
data("bastaCMRdat", package = "BaSTA")

## Check data consistency:
checkedData  <- DataCheck(bastaCMRdat, dataType = "CMR", studyStart = 51, 
                          studyEnd = 70)

## census data:
## ------------ #
## Load data:
data("bastaCensDat", package = "BaSTA")

## Check data consistency:
checkedData  <- DataCheck(object = bastaCensDat, dataType = "census")

## Printed output:
## --------------- #
## Print DataCheck results:
print(checkedData)

Run the code above in your browser using DataLab