Performs supervised Naive Bayes Classification on verbal autopsy data.
nbc(train, test, known = TRUE)
Dataframe of verbal autopsy train data (See Data documentation).
Columns (in order): ID, Cause, Symptom-1 to Symptom-n..
ID (vectorof char): unique case identifiers
Cause (vectorof char): observed causes for each case
Symptom-n.. (vectorsof (1 OR 0)): 1 for presence, 0 for absence, other values are treated as unknown
Unknown symptoms are imputed randomly from distributions of 1s and 0s per symptom column; if no 1s or 0s exist then the column is removed
Example:
ID | Cause | S1 | S2 | S3 |
"a1" | "HIV" | 1 | 0 | 0 |
"b2" | "Stroke" | 0 | 0 | 1 |
Dataframe of verbal autopsy test data in the same format as train except if causes are not known:
The 2nd column (Cause) can be omitted if known is FALSE
TRUE to indicate that the test causes are available in the 2nd column and FALSE to indicate that they are not known
out The result nbc list object containing:
$prob.causes (vectorof double): the probabilities for each test case prediction by case id
$pred.causes (vectorof char): the predictions for each test case by case id
Additional values:
* indicates that the value is only available if test causes are known
$train (dataframe): the input train data
$train.ids (vectorof char): the ids of the train data
$train.causes (vectorof char): the causes of the train data by case id
$train.samples (double): the number of input train samples
$test (dataframe): the input test data
$test.ids (vectorof char): the ids of the test data
$test.causes* (vectorof char): the causes of the test data by case id
$test.samples (double): the number of input test samples
$test.known (logical): whether the test causes are known
$symptoms (vectorof char): all unique symptoms in order
$causes (vectorof char): all possible unique causes of death
$causes.train (vectorof char): all unique causes of death in the train data
$causes.test* (vectorof char): all unique causes of death in the test data
$causes.pred (vectorof char): all unique causes of death in the predicted cases
$causes.obs* (vectorof char): all unique causes of death in the observed cases
$pred (dataframe): a table of predictions for each test case, sorted by probability
Columns (in order): CaseID, TrueCause, Prediction-1 to Prediction-n..
CaseID (vectorof char): case identifiers
TrueCause* (vectorof char): the observed causes of death
Prediction-n.. (vectorsof char): the predicted causes of death, where Prediction1 is the most probable cause, and Prediction-n is the least probable cause
Example:
CaseID | Prediction1 | Prediction2 | "a1" |
"HIV" | "Stroke" | "b2" | "Stroke" |
"HIV" | CaseID | Prediction1 | Prediction2 |
$obs* (dataframe): a table of observed causes matching $pred for each test case
Columns (in order): CaseID, TrueCause
CaseID (vectorof char): case identifiers
TrueCause (vectorof char): the actual cause of death if applicable
Example:
CaseID | TrueCause | "a1" | "HIV" |
"b2" | "Stroke" | CaseID | TrueCause |
$obs.causes* (vectorof char): all observed causes of death by case id
$prob (dataframe): a table of probabilities of each cause for each test case
Columns (in order): CaseID, Cause-1 to Cause-n..
CaseID (vectorof char): case identifiers
Cause-n.. (vectorsof double): probabilies for each cause of death
Example:
CaseID | HIV | Stroke |
"a1" | 0.5 | 0.5 |
"b2" | 0.3 | 0.7 |
Miasnikof P, Giannakeas V, Gomes M, Aleksandrowicz L, Shestopaloff AY, Alam D, Tollman S, Samarikhalaj, Jha P. Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths. BMC Medicine. 2015;13:286. doi:10.1186/s12916-015-0521-2.
Other main functions:
plot.nbc()
,
print.nbc_summary()
,
summary.nbc()
# NOT RUN {
library(nbc4va)
data(nbc4vaData)
# Run naive bayes classifier on random train and test data
# Set "known" to indicate whether or not "test" causes are known
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]
results <- nbc(train, test, known=TRUE)
# Obtain the probabilities and predictions
prob <- results$prob.causes
pred <- results$pred.causes
# }
Run the code above in your browser using DataLab