DataCheck: Error checking for BaSTA input data.

Description

A function to check the input data file for a Bayesian Survival Trajectory Analysis (BaSTA) for capture-mark-recapture (CMR) or census data.

Usage

DataCheck (object, dataType = "CMR", studyStart = NULL, studyEnd = NULL,  silent = TRUE)

Value

1) CMR data:

newData: The original data frame (for consistency with previous versions of BaSTA).
type1: A vector of row numbers in the original data frame where there are deaths occurring before the study starts.
type2: A vector of row numbers in the original data frame where there are no birth/death AND no obervations.
type3: A vector of row numbers in the original data frame where there are births recorded after death.
type4: A vector of row numbers in the original data frame where there are observations (i.e. recaptures) after death.
type5: A vector of row numbers in the original data frame where there are observations (i.e. recaptures) before birth.
type6: A vector of row numbers in the original data frame where the year of birth is not a zero in the recapture matrix.
summary: List with summary information, e.g., sample size, number of records with known birth, number of records with known death, etc.
stopExec: Logical that indicates if the data are free of errors or not. i.e. TRUE = the data have no apparent errors, and FALSE = there is at leat one error.
probDescr: Character vector explaining the six types of problems the DataCheck functions looks for.
dataType: Type of dataset, i.e., “CMR”.
studyStart: Integer indicating the study start time.
studyEnd: Integer indicating the study end time.

2) census data:

n: Integer for the number of rows (i.e., records) in the dataset.
stopExec: Logical that indicates if the data are free of errors or not. i.e. TRUE = the data have no apparent errors, and FALSE = there is at leat one error.
nas: List organised by column indicating whether NAs were detected in a given column.
DateRan: Matrix of dates ranges (as character strings) for each date column in the dataset.
probDescr: Character vector explaining the seven types of problems the DataCheck functions looks for.
MinBBirth: Vector of indices of rows where “Min.Birth.Date” was larger than “Birth.Date”.
BirthMaxB: Vector of indices of rows where “Birth.Date” was larger than “Max.Birth.Date”.
MinBMaxB: Vector of indices of rows where “Min.Birth.Date” was larger than “Max.Birth.Date”.
BirthEntr: Vector of indices of rows where “Birth.Date” was larger than “Endtry.Date”.
MinBEntr: Vector of indices of rows where “Min.Birth.Date” was larger than “Entry.Date”.
MaxBEntr: Vector of indices of rows where “Max.Birth.Date” was larger than “Entry.Date”.
EntrDep: Vector of indices of rows where “Entry.Date” was larger than “Depart.Date”.
DepartType: Vector of indices of rows where “Depart.Type” does not fall within the “C” (i.e., censored) or “D” (i.e., uncensored or death) categories.
idUnCens: Vector of indices of rows for uncensored (i.e., death) records.
nUnCens: Integer indicating the number of uncensored records.
idCens: Vector of indices of rows for censored records.
nCens: Integer indicating the number of uncensored records.
idNoBirth: Vector of indices of rows for records with uncertain birth date.
nNoBirth: Integer indicating the number of records with uncertain birth date.

Arguments

object: A data.frame to be used as an input data file for BaSTA. Note: BaSTA can take two types of datasets, namely capture-mark-recapture (CMR) or census data.
dataType: A character string indicating if the data are capture-mark-recapture (CMR) or census. Options are “CMR” (default) or “census”.
studyStart: Only required for dataType = “CMR”, an integer indicating the first year of the study.
studyEnd: Only required for dataType = “CMR”, an integer indicating the last year of the study.
silent: Logical to indicate whether the results should be printed to the console.

Author

Fernando Colchero fernando_colchero@eva.mpg.de

Details

The function checks for inconsistencies in the dataset and reports them back. See value section for details on the types of errors detected by the function.

DATA SPECIFICATIONS:

1) CMR data: The input data object requires the following structure: the first column should be a vector of individual unique IDs, the second and third columns are birth and death years respectively. Columns \(4, \dots, T+3\) represent the observation window (i.e., recapture matrix) of \(T\) years. This is followed (optionally) by columns for categorical and continuous covariates.

2) census data: The input data object requires at least five dates columns, namely “Birth.Date”, “Min.Birth.Date”, “Max.Birth.Date”, “Entry.Date”, and “Depart.Date”. All dates need to be format as “%Y-%m-%d”. In addition, a “Depart.Type” column is required with two types of departures “C” for Censored and “D” for dead.

Examples

Run this code

## CMR data:
## --------- #
## Load data:
data("bastaCMRdat", package = "BaSTA")

## Check data consistency:
checkedData  <- DataCheck(bastaCMRdat, dataType = "CMR", studyStart = 51, 
                          studyEnd = 70)

## census data:
## ------------ #
## Load data:
data("bastaCensDat", package = "BaSTA")

## Check data consistency:
checkedData  <- DataCheck(object = bastaCensDat, dataType = "census")

## Printed output:
## --------------- #
## Print DataCheck results:
print(checkedData)

Run the code above in your browser using DataLab