A function to check the input data file for a Bayesian Survival Trajectory Analysis (BaSTA) for capture-mark-recapture (CMR) or census data.
DataCheck (object, dataType = "CMR", studyStart = NULL, studyEnd = NULL, silent = TRUE)
1) CMR data:
The original data frame (for consistency with previous versions of BaSTA).
A vector of row numbers in the original data frame where there are deaths occurring before the study starts.
A vector of row numbers in the original data frame where there are no birth/death AND no obervations.
A vector of row numbers in the original data frame where there are births recorded after death.
A vector of row numbers in the original data frame where there are observations (i.e. recaptures) after death.
A vector of row numbers in the original data frame where there are observations (i.e. recaptures) before birth.
A vector of row numbers in the original data frame where the year of birth is not a zero in the recapture matrix.
List with summary information, e.g., sample size, number of records with known birth, number of records with known death, etc.
Logical that indicates if the data are free of errors or not. i.e. TRUE
= the data have no apparent errors, and FALSE
= there is at leat one error.
Character vector explaining the six types of problems the DataCheck
functions looks for.
Type of dataset, i.e., “CMR
”.
Integer indicating the study start time.
Integer indicating the study end time.
2) census data:
Integer for the number of rows (i.e., records) in the dataset.
Logical that indicates if the data are free of errors or not. i.e. TRUE
= the data have no apparent errors, and FALSE
= there is at leat one error.
List organised by column indicating whether NA
s were detected in a given column.
Matrix of dates ranges (as character strings) for each date column in the dataset.
Character vector explaining the seven types of problems the DataCheck
functions looks for.
Vector of indices of rows where “Min.Birth.Date
” was larger than “Birth.Date
”.
Vector of indices of rows where “Birth.Date
” was larger than “Max.Birth.Date
”.
Vector of indices of rows where “Min.Birth.Date
” was larger than “Max.Birth.Date
”.
Vector of indices of rows where “Birth.Date
” was larger than “Endtry.Date
”.
Vector of indices of rows where “Min.Birth.Date
” was larger than “Entry.Date
”.
Vector of indices of rows where “Max.Birth.Date
” was larger than “Entry.Date
”.
Vector of indices of rows where “Entry.Date
” was larger than “Depart.Date
”.
Vector of indices of rows where “Depart.Type
” does not fall within the “C
” (i.e., censored) or “D
” (i.e., uncensored or death) categories.
Vector of indices of rows for uncensored (i.e., death) records.
Integer indicating the number of uncensored records.
Vector of indices of rows for censored records.
Integer indicating the number of uncensored records.
Vector of indices of rows for records with uncertain birth date.
Integer indicating the number of records with uncertain birth date.
A data.frame
to be used as an input data file for BaSTA. Note: BaSTA can take two types of datasets, namely capture-mark-recapture (CMR
) or census data.
A character string
indicating if the data are capture-mark-recapture (CMR) or census. Options are “CMR
” (default) or “census
”.
Only required for dataType =
“CMR
”, an integer indicating the first year of the study.
Only required for dataType =
“CMR
”, an integer indicating the last year of the study.
Logical to indicate whether the results should be printed to the console.
Fernando Colchero fernando_colchero@eva.mpg.de
The function checks for inconsistencies in the dataset and reports them back. See value
section for details on the types of errors detected by the function.
DATA SPECIFICATIONS:
1) CMR data:
The input data object
requires the following structure: the first column should be a vector of individual unique IDs, the second and third columns are birth and death years respectively. Columns \(4, \dots, T+3\) represent the observation window (i.e., recapture matrix) of \(T\) years. This is followed (optionally) by columns for categorical and continuous covariates.
2) census data:
The input data object
requires at least five dates columns, namely “Birth.Date”, “Min.Birth.Date”, “Max.Birth.Date”, “Entry.Date”, and “Depart.Date”. All dates need to be format as “%Y-%m-%d”. In addition, a “Depart.Type” column is required with two types of departures “C” for Censored and “D” for dead.
FixCMRdata
to fix potential issues for capture-mark-recapture data.
## CMR data:
## --------- #
## Load data:
data("bastaCMRdat", package = "BaSTA")
## Check data consistency:
checkedData <- DataCheck(bastaCMRdat, dataType = "CMR", studyStart = 51,
studyEnd = 70)
## census data:
## ------------ #
## Load data:
data("bastaCensDat", package = "BaSTA")
## Check data consistency:
checkedData <- DataCheck(object = bastaCensDat, dataType = "census")
## Printed output:
## --------------- #
## Print DataCheck results:
print(checkedData)
Run the code above in your browser using DataLab